Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QSFS losing connection during operation #19

Open
mohamedamer453 opened this issue Mar 16, 2022 · 1 comment
Open

QSFS losing connection during operation #19

mohamedamer453 opened this issue Mar 16, 2022 · 1 comment

Comments

@mohamedamer453
Copy link

After connecting to a deployed Vm with a QSFS using terraform provider on testnet.
Using the example from threefoldtech/terraform-provider-grid
with the following config for the QSFS

First Example

   name = "qsfs"
   description = "description6"
   cache = 256 
   minimal_shards = 2
   expected_shards = 4
   redundant_groups = 0
   redundant_nodes = 1
   max_zdb_data_dir_size = 64
   encryption_algorithm = "AES"
   encryption_key = "4d778ba3216e4da4231540c92a55f06157cabba802f9b68fb0f78375d2e825af"
   compression_algorithm = "snappy"

VM Config:

vms {
    name = "vm"
    flist = "https://hub.grid.tf/tf-official-apps/base:latest.flist"
    cpu = 4
    memory = 2048
    entrypoint = "/sbin/zinit init"
    planetary = true
    env_vars = {
      SSH_KEY = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC533B35CELELtgg2d7Tsi5KelLxR0FYUlrcTmRRQuTNP9arP01JYD8iHKqh6naMbbzR8+M0gdPEeRK4oVqQtEcH1C47vLyRI/4DqahAE2nTW08wtJM5uiIvcQ9H2HMzZ3MXYWWlgyHMgW2QXQxzrRS0NXvsY+4wxe97MMZs9MDs+d+X15DfG6JffjMHydi+4tHB50WmHe5tFscBFxLbgDBUxNGiwi3BQc1nWIuYwMMV1GFwT3ndyLAp19KPkEa/dffiqLdzkgs2qpXtfBhTZ/lFeQRc60DHCMWExr9ySDbavIMuBFylf/ZQeJXm9dFXJN7bBTbflZIIuUMjmrI7cU5eSuZqAj5l+Yb1mLN8ljmKSIM3/tkKbzXNH5AUtRVKTn+aEPvJAEYtserAxAP5pjy6nmegn0UerEE3DWEV2kqDig3aPSNhi9WSCykvG2tz7DIr0UP6qEIWYMC/5OisnSGj8w8dAjyxS9B0Jlx7DEmqPDNBqp8UcwV75Cot8vtIac= root@mohamed-Inspiron-3576"
    }
    mounts {
        disk_name = "qsfs"
        mount_point = "/qsfs"
    }
  }

According to TC199 after setting the cache to 256MB and trying to create a larger file with the following command

dd if=/dev/urandom of=tmp.txt bs=1M count=1024

Firstly it gives an i/o error and only creates a 256MB file

dd: error writing 'tmp.txt': I/O error
256+0 records in
255+0 records out

after that using the command df -h to check the disk size or trying any other command on /qsfs it gives a socket not connected error

df: /qsfs: Socket not connected

and after disconnecting from the vm and reconnecting again. still there is no /qsfs

vm:~# cd /qsfs 


-ash: cd: can't cd to /qsfs: Socket not connected

Logs
metrics.txt

**note: after the qsfs disconnected i tried to redeploy using terraform init && terraform apply -parallelism=1 and it also did not work

Second Example

Config is the same as the first example except:

    cache = 1024 
    max_zdb_data_dir_size = 256

VM Config is the same as the first example.

According to TC214 after setting the cache to 1GB and trying to create a 2GB file with the following command

dd if=/dev/urandom of=tmp.txt bs=1M count=2048

It gives an i/o error and only creates a 1GB file

dd: error writing 'tmp.txt': I/O error
1008+0 records in
1007+0 records out

Without disconnecting.

but after trying the same command again to create another file it disconnects during the creation of the file.

dd: error writing 'tmp2.txt': Connection aborted
755+0 records in
754+0 records out

it only created a 750MB file and disconnected during creation giving Connection aborted and after that error trying any command on /qsfs gives Socket not connected

Disconnecting and reconnecting , redeploying also did not work.

Logs
metrics.txt

@rkhamis rkhamis transferred this issue from threefoldtech/test_feedback Apr 21, 2022
@muhamadazmy muhamadazmy transferred this issue from threefoldtech/zos Apr 28, 2022
@mohamedamer453
Copy link
Author

The issue is still available even when i tested it with a different config (16+4+4).

The qsfs was working fine and i was able to read/write/list multiple files with this setup until i tried creating a 10gb file it took a long time to create the file then it produced an error dd: error writing 'tmp24.txt': Connection aborted and i wasn't able to access the /qsfs mount df: /qsfs: Socket not connected.

A list of some of the files that were created successfully

vm:/qsfs/randFiles# ls -lS
total 10465287
-rw-r--r--    1 root     root     2147483648 May 23 09:49 tmp20.txt
-rw-r--r--    1 root     root     2147483648 May 23 09:52 tmp21.txt
-rw-r--r--    1 root     root     1073741824 May 23 09:45 tmp10.txt
-rw-r--r--    1 root     root     943718400 May 23 09:43 tmp9.txt
-rw-r--r--    1 root     root     838860800 May 23 09:43 tmp8.txt
-rw-r--r--    1 root     root     734003200 May 23 09:42 tmp7.txt
-rw-r--r--    1 root     root     629145600 May 23 09:55 tmp23.txt
-rw-r--r--    1 root     root     629145600 May 23 09:41 tmp6.txt
-rw-r--r--    1 root     root     524288000 May 23 09:40 tmp5.txt
-rw-r--r--    1 root     root     419430400 May 23 09:39 tmp4.txt
-rw-r--r--    1 root     root     314572800 May 23 09:39 tmp3.txt
-rw-r--r--    1 root     root     209715200 May 23 09:39 tmp2.txt
-rw-r--r--    1 root     root     104857600 May 23 09:38 tmp.txt
-rw-r--r--    1 root     root             6 May 23 09:37 File000.txt

and the file that produced the error

vm:/qsfs/randFiles# dd if=/dev/urandom of=tmp24.txt bs=1M count=10240
dd: error writing 'tmp24.txt': Connection aborted
9048+0 records in
9047+0 records out
vm:/qsfs/randFiles# df -h
Filesystem                Size      Used Available Use% Mounted on
dev                     480.3M         0    480.3M   0% /dev
run                     486.1M      8.0K    486.1M   0% /run
tmpfs                   486.1M         0    486.1M   0% /dev/shm
/dev/root               476.9G    163.8G    312.2G  34% /
df: /qsfs: Socket not connected
  • main.tf
terraform {
  required_providers {
    grid = {
      source = "threefoldtech/grid"
    }
  }
}

provider "grid" {
}

locals {
  metas = ["meta1", "meta2", "meta3", "meta4"]
  datas = ["data1", "data2", "data3", "data4",
  "data5", "data6", "data7", "data8",
  "data9", "data10", "data11", "data12",
  "data13", "data14", "data15", "data16",
  "data17", "data18", "data19", "data20",
  "data21", "data22", "data23", "data24"]
}

resource "grid_network" "net1" {
    nodes = [7]
    ip_range = "10.1.0.0/16"
    name = "network"
    description = "newer network"
}

resource "grid_deployment" "d1" {
    node = 7
    dynamic "zdbs" {
        for_each = local.metas
        content {
            name = zdbs.value
            description = "description"
            password = "password"
            size = 10
            mode = "user"
        }
    }
    dynamic "zdbs" {
        for_each = local.datas
        content {
            name = zdbs.value
            description = "description"
            password = "password"
            size = 10
            mode = "seq"
        }
    }
}

resource "grid_deployment" "qsfs" {
  node = 7
  network_name = grid_network.net1.name
  ip_range = lookup(grid_network.net1.nodes_ip_range, 7, "")
  qsfs {
    name = "qsfs"
    description = "description6"
    cache = 10240 # 10 GB
    minimal_shards = 16
    expected_shards = 20
    redundant_groups = 0
    redundant_nodes = 0
    max_zdb_data_dir_size = 512 # 512 MB
    encryption_algorithm = "AES"
    encryption_key = "4d778ba3216e4da4231540c92a55f06157cabba802f9b68fb0f78375d2e825af"
    compression_algorithm = "snappy"
    metadata {
      type = "zdb"
      prefix = "hamada"
      encryption_algorithm = "AES"
      encryption_key = "4d778ba3216e4da4231540c92a55f06157cabba802f9b68fb0f78375d2e825af"
      dynamic "backends" {
          for_each = [for zdb in grid_deployment.d1.zdbs : zdb if zdb.mode != "seq"]
          content {
              address = format("[%s]:%d", backends.value.ips[1], backends.value.port)
              namespace = backends.value.namespace
              password = backends.value.password
          }
      }
    }
    groups {
      dynamic "backends" {
          for_each = [for zdb in grid_deployment.d1.zdbs : zdb if zdb.mode == "seq"]
          content {
              address = format("[%s]:%d", backends.value.ips[1], backends.value.port)
              namespace = backends.value.namespace
              password = backends.value.password
          }
      }
    }
  }
  vms {
    name = "vm"
    flist = "https://hub.grid.tf/tf-official-apps/base:latest.flist"
    cpu = 2
    memory = 1024
    entrypoint = "/sbin/zinit init"
    planetary = true
    env_vars = {
      SSH_KEY = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC533B35CELELtgg2d7Tsi5KelLxR0FYUlrcTmRRQuTNP9arP01JYD8iHKqh6naMbbzR8+M0gdPEeRK4oVqQtEcH1C47vLyRI/4DqahAE2nTW08wtJM5uiIvcQ9H2HMzZ3MXYWWlgyHMgW2QXQxzrRS0NXvsY+4wxe97MMZs9MDs+d+X15DfG6JffjMHydi+4tHB50WmHe5tFscBFxLbgDBUxNGiwi3BQc1nWIuYwMMV1GFwT3ndyLAp19KPkEa/dffiqLdzkgs2qpXtfBhTZ/lFeQRc60DHCMWExr9ySDbavIMuBFylf/ZQeJXm9dFXJN7bBTbflZIIuUMjmrI7cU5eSuZqAj5l+Yb1mLN8ljmKSIM3/tkKbzXNH5AUtRVKTn+aEPvJAEYtserAxAP5pjy6nmegn0UerEE3DWEV2kqDig3aPSNhi9WSCykvG2tz7DIr0UP6qEIWYMC/5OisnSGj8w8dAjyxS9B0Jlx7DEmqPDNBqp8UcwV75Cot8vtIac= root@mohamed-Inspiron-3576"
    }
    mounts {
        disk_name = "qsfs"
        mount_point = "/qsfs"
    }
  }
}
output "metrics" {
    value = grid_deployment.qsfs.qsfs[0].metrics_endpoint
}
output "ygg_ip" {
    value = grid_deployment.qsfs.vms[0].ygg_ip
}

Metrics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant