QSFS: Performance numbers are not matching the requirements #24

mohamedamer453 · 2022-05-24T13:20:42Z

According to TC343 & REQ177 the min performance for big files (1GB) should be 100 MB/S but the actual figure is much lower

TC344 & REQ178 the min performance for mid files (1M) should be 100 MB/S but the actual figure is lower

TC345 & REQ179 the min performance for small files (1K) should be 1 MB/S but the actual figure is lower

TC346 & REQ180 states that it should be possible to create 10 million small files on a single qsfs. but when i tried to do that with the following script i lost connection to the vm

for i in {0..10000000}; do dd if=/dev/urandom of="File$(printf "%03d" "$i").txt" bs=1K count=1; done;

however when the number was changed to 1 million instead of 10 million i was able to create the files as it can be seen in the metrics

and i tried to create another 1 million files after the first 1 mil were done, it didn't crash or lose connection but the process is much slower than the first 1 mil

The same issues are also available in these test cases and requirements TC347, TC348, TC349 & REQ181, REQ182, REQ183 these are the same the same scenarios as mentioned above but with S3 Minio and the results are also very similar to the previous scenarios.

Config

main.tf

terraform {
  required_providers {
    grid = {
      source = "threefoldtech/grid"
    }
  }
}

provider "grid" {
}

locals {
  metas = ["meta1", "meta2", "meta3", "meta4"]
  datas = ["data1", "data2", "data3", "data4",
  "data5", "data6", "data7", "data8",
  "data9", "data10", "data11", "data12",
  "data13", "data14", "data15", "data16",
  "data17", "data18", "data19", "data20",
  "data21", "data22", "data23", "data24"]
}

resource "grid_network" "net1" {
    nodes = [7]
    ip_range = "10.1.0.0/16"
    name = "network"
    description = "newer network"
}

resource "grid_deployment" "d1" {
    node = 7
    dynamic "zdbs" {
        for_each = local.metas
        content {
            name = zdbs.value
            description = "description"
            password = "password"
            size = 10
            mode = "user"
        }
    }
    dynamic "zdbs" {
        for_each = local.datas
        content {
            name = zdbs.value
            description = "description"
            password = "password"
            size = 1
            mode = "seq"
        }
    }
}

resource "grid_deployment" "qsfs" {
  node = 7
  network_name = grid_network.net1.name
  ip_range = lookup(grid_network.net1.nodes_ip_range, 7, "")
  qsfs {
    name = "qsfs"
    description = "description6"
    cache = 10240 # 10 GB
    minimal_shards = 16
    expected_shards = 20
    redundant_groups = 0
    redundant_nodes = 0
    max_zdb_data_dir_size = 512 # 512 MB
    encryption_algorithm = "AES"
    encryption_key = "4d778ba3216e4da4231540c92a55f06157cabba802f9b68fb0f78375d2e825af"
    compression_algorithm = "snappy"
    metadata {
      type = "zdb"
      prefix = "hamada"
      encryption_algorithm = "AES"
      encryption_key = "4d778ba3216e4da4231540c92a55f06157cabba802f9b68fb0f78375d2e825af"
      dynamic "backends" {
          for_each = [for zdb in grid_deployment.d1.zdbs : zdb if zdb.mode != "seq"]
          content {
              address = format("[%s]:%d", backends.value.ips[1], backends.value.port)
              namespace = backends.value.namespace
              password = backends.value.password
          }
      }
    }
    groups {
      dynamic "backends" {
          for_each = [for zdb in grid_deployment.d1.zdbs : zdb if zdb.mode == "seq"]
          content {
              address = format("[%s]:%d", backends.value.ips[1], backends.value.port)
              namespace = backends.value.namespace
              password = backends.value.password
          }
      }
    }
  }


  vms {
    name = "vm"
    flist = "https://hub.grid.tf/tf-official-apps/threefoldtech-ubuntu-20.04.flist"
    cpu = 2
    memory = 1024
    entrypoint = "/init.sh"
    planetary = true
    env_vars = {
      SSH_KEY = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC533B35CELELtgg2d7Tsi5KelLxR0FYUlrcTmRRQuTNP9arP01JYD8iHKqh6naMbbzR8+M0gdPEeRK4oVqQtEcH1C47vLyRI/4DqahAE2nTW08wtJM5uiIvcQ9H2HMzZ3MXYWWlgyHMgW2QXQxzrRS0NXvsY+4wxe97MMZs9MDs+d+X15DfG6JffjMHydi+4tHB50WmHe5tFscBFxLbgDBUxNGiwi3BQc1nWIuYwMMV1GFwT3ndyLAp19KPkEa/dffiqLdzkgs2qpXtfBhTZ/lFeQRc60DHCMWExr9ySDbavIMuBFylf/ZQeJXm9dFXJN7bBTbflZIIuUMjmrI7cU5eSuZqAj5l+Yb1mLN8ljmKSIM3/tkKbzXNH5AUtRVKTn+aEPvJAEYtserAxAP5pjy6nmegn0UerEE3DWEV2kqDig3aPSNhi9WSCykvG2tz7DIr0UP6qEIWYMC/5OisnSGj8w8dAjyxS9B0Jlx7DEmqPDNBqp8UcwV75Cot8vtIac= root@mohamed-Inspiron-3576"
    }
    mounts {
        disk_name = "qsfs"
        mount_point = "/qsfs"
    }
  }
}
output "metrics" {
    value = grid_deployment.qsfs.qsfs[0].metrics_endpoint
}
output "ygg_ip" {
    value = grid_deployment.qsfs.vms[0].ygg_ip
}

The text was updated successfully, but these errors were encountered:

maxux · 2022-06-01T11:03:19Z

Can you first check how fast is urandom ? This can be slow.

dd if=/dev/urandom of=/dev/null bs=1M count=1000

Can you show how zdbfs is started ?

About the 10 millions crash, I'll open an issue on zdbfs and see if I can reproduce.

For the slowdown, this can happen because adding 1 million file in a single directory is a really bad thing to do, even on real filesystem. But I could notice already a huge drop down after reaching some point, but could not reproduce yet, I'll open an issue for that as well and will keep here notified.

maxux · 2022-06-01T23:08:15Z

After investigating and some debugging, I think the 10 millions files problem you had (connection closed or vm killed) is an out-of-memory issue and not a qsfs/zdbfs issue. When inserting lot of files in the same directory, memory grows up (800+ MB on my test) which is probably out of the limit you allowed.

I have a 7+ millions files in a single directory without crash (but it's slow).

I'm looking to improve that memory usage.

mohamedamer453 · 2022-06-02T09:47:24Z

So for the performance numbers urandom was indeed slow and i tried testing with the command you mentioned and i got much faster results.

dd if=/dev/urandom of=/dev/null bs=1M count=1000

For large files (1GB) the speed averaged at around 190 MB/s
For med files (1MB) the speed averaged at around 160 MB/s
For small files (1K) the speed averaged at around 1 MB/s

Can you show how zdbfs is started ?

The zdbfs started as part of a qsfs deployment from terraform grid provider and the full setup can be seen in the included main.tf

maxux · 2022-06-02T09:52:49Z

You misunderstood the test regarding urandom, the command I asked was just to ensure you can reach at least 100 MB/s by reading urandom (which seems to be good), since this command write to /dev/null this won't reach qsfs, it was just to confirm. Thanks :p

For the memory usage, I guess /qsfs mounted in the VM have zdbfs running inside the VM.
You can confirm by executing ps aux | grep zdbfs. If that's true, your VM have only 1 GB of memory, which could be the issue for 10 millions files in a single directory. Try to increase VM memory to like 4 GB or 8 GB and see if you can reproduce the crash :)

mohamedamer453 · 2022-06-02T10:40:31Z

You misunderstood the test regarding urandom, the command I asked was just to ensure you can reach at least 100 MB/s by reading urandom (which seems to be good), since this command write to /dev/null this won't reach qsfs, it was just to confirm. Thanks :p

Yep my bad :D i got confused there for a second.

For the memory usage, I guess /qsfs mounted in the VM have zdbfs running inside the VM. You can confirm by executing ps aux | grep zdbfs. If that's true, your VM have only 1 GB of memory, which could be the issue for 10 millions files in a single directory. Try to increase VM memory to like 4 GB or 8 GB and see if you can reproduce the crash :)

Indeed /qsfs was mounted in a VM

and the result from executing ps aux | grep zdbfs in the VM is

root@vm:~# ps aux | grep zdbfs
root       172  1.0  0.0   5196  1508 pts/0    S+   10:24   0:00 grep --color=auto zdbfs

and by increasing the memory to 4GB the command for the 10 million files didn't crash the VM and by checking the metrics the files were being created. but still not sure if it will be able to complete the process of creating 10 million files or not.

ramezsaeed transferred this issue from threefoldtech/test_feedback Jun 1, 2022

This was referenced Jun 1, 2022

directory: zdbfs seems to crash when adding 10 millions file in a directory threefoldtech/0-db-fs#32

Open

inode: performance drop down when creating lot of files/content in a directory threefoldtech/0-db-fs#33

Open

LeeSmet transferred this issue from threefoldtech/0-stor_v2 Nov 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QSFS: Performance numbers are not matching the requirements #24

QSFS: Performance numbers are not matching the requirements #24

mohamedamer453 commented May 24, 2022 •

edited

Loading

maxux commented Jun 1, 2022

maxux commented Jun 1, 2022 •

edited

Loading

mohamedamer453 commented Jun 2, 2022

maxux commented Jun 2, 2022 •

edited

Loading

mohamedamer453 commented Jun 2, 2022

QSFS: Performance numbers are not matching the requirements #24

QSFS: Performance numbers are not matching the requirements #24

Comments

mohamedamer453 commented May 24, 2022 • edited Loading

Config

maxux commented Jun 1, 2022

maxux commented Jun 1, 2022 • edited Loading

mohamedamer453 commented Jun 2, 2022

maxux commented Jun 2, 2022 • edited Loading

mohamedamer453 commented Jun 2, 2022

mohamedamer453 commented May 24, 2022 •

edited

Loading

maxux commented Jun 1, 2022 •

edited

Loading

maxux commented Jun 2, 2022 •

edited

Loading