Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes Helm Testing #317

Open
odatskov opened this issue Jul 10, 2020 · 18 comments
Open

Kubernetes Helm Testing #317

odatskov opened this issue Jul 10, 2020 · 18 comments

Comments

@odatskov
Copy link

Dear Developers,

I am currently testing running a Singularity registry on Kubernetes resources. I have ported the docker compose you have into a basic Helm chart. Three questions I have:

  • The obvious one: is there already a chart or are you working on one (so as not to duplicate effort)?
  • Based on storage docs: for recent container versions is /var/www/images still needed to be a persistent storage place?
  • worker redis dependency: hardcoded to be a local service or can also be configured with external?
@vsoch
Copy link
Member

vsoch commented Jul 10, 2020

It’s so great that you are doing this! There isn’t a chart and it would be great to have one. The images folder is kept for backwards compatibility, and I believe might still be used if you upload a container directly in the interface (do you have it running and could test?) And I also don’t see why the redis worker could be configured to be external.

Really looking forward to checking this out!

@odatskov
Copy link
Author

Most excellent! Thank you for such a quick response. Indeed I am running on local K8s infra and will definitely check the interactive upload.

@odatskov
Copy link
Author

Tested with persistent storage and GitHub auth. Singularity shub pull seems fine, however the push is giving a bit of trouble: singularity doesn't like the tokenfile (signed token?) and with sregistry push I am getting an exception of "certificate verify failed" (even with SREGISTRY_REGISTRY_NOHTTPS=true).

In regards to previous comments:

  • Interactive webui push: indeed the sif image can be found in /var/www/images/
  • Redis custom url I found can be set through REDIS_URL environment.
  • settings/config.py duplicate of DISABLE_BUILDING or expected?

@vsoch
Copy link
Member

vsoch commented Jul 16, 2020

Ah, I bet I know the issue! For the library endpoint you actually can't use http - so the way that I test locally is by compiling singularity with remote.go set to http. Just change this line from https to http, and then build. The instructions are in the docs here. The disable_building repeat is an error! I have a PR open now and I'll remove the extra one.

@vsoch
Copy link
Member

vsoch commented Sep 17, 2020

Any updates on working on a chart?

@odatskov
Copy link
Author

I am indeed testing the chart on our infra. Will have results in a bit.

@ifelsefi
Copy link

ifelsefi commented Apr 6, 2021

Any update with the chart? I would like to use this as well.

@vsoch
Copy link
Member

vsoch commented Apr 6, 2021

I suspect @odatskov is not working on it anymore. @odatskov - can you share what you started so possibly someone else can take over? And if you had any kind of issues, tell us and maybe we can help?

@ifelsefi
Copy link

ifelsefi commented Apr 6, 2021

I forked it.

The image quay.io/vanessa/sregistry_nginx:1.1.34 fails due to 2021/04/06 19:32:28 [emerg] 1#1: invalid number of arguments in "client_body_timeout" directive in /etc/nginx/conf.d/default.conf:7

Deleted that entirely then new errors in same container. Would be nice if @odatskov would share feedback but seems to be AFK for a long time.

@vsoch
Copy link
Member

vsoch commented Apr 6, 2021

You don't technically need sregistry_nginx anymore - that was when we had a custom nginx module to do uploads. The current sregistry using the sylabs library API with a minio storage backend. So I'd remove that image and use some standard nginx.

@vsoch
Copy link
Member

vsoch commented Apr 6, 2021

There is also some "wisdom" on stack overflow if you want to try debugging it (without removing it) https://serverfault.com/questions/786982/nginx-conditional-proxy-invalid-number-of-arguments-in-try-files-directive

@ifelsefi
Copy link

ifelsefi commented Jun 17, 2021

Hmm ok I am using a nginx image with your release 1.1.34.

The uwsgi container gives me this django error:

Thu Jun 17 14:02:59 2021 - Python main interpreter initialized at 0x55866ec883e0
Thu Jun 17 14:02:59 2021 - uWSGI running as root, you can use --uid/--gid/--chroot options
Thu Jun 17 14:02:59 2021 - *** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
Thu Jun 17 14:02:59 2021 - python threads support enabled
Thu Jun 17 14:02:59 2021 - your server socket listen backlog is limited to 100 connections
Thu Jun 17 14:02:59 2021 - your mercy for graceful operations on workers is 60 seconds
Thu Jun 17 14:02:59 2021 - mapped 521440 bytes (509 KB) for 16 cores
Thu Jun 17 14:02:59 2021 - *** Operational MODE: preforking+threaded ***
Thu Jun 17 09:02:59 2021 - WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x55866ec883e0 pid: 52 (default app)
Thu Jun 17 09:02:59 2021 - uWSGI running as root, you can use --uid/--gid/--chroot options
Thu Jun 17 09:02:59 2021 - *** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
Thu Jun 17 09:02:59 2021 - *** uWSGI is running in multiple interpreter mode ***
Thu Jun 17 09:02:59 2021 - spawned uWSGI master process (pid: 52)
Thu Jun 17 09:02:59 2021 - spawned uWSGI worker 1 (pid: 54, cores: 4)
Thu Jun 17 09:02:59 2021 - spawned uWSGI worker 2 (pid: 57, cores: 4)
Thu Jun 17 09:02:59 2021 - spawned uWSGI worker 3 (pid: 59, cores: 4)
Thu Jun 17 09:02:59 2021 - spawned uWSGI worker 4 (pid: 63, cores: 4)

[ more stuff here ] 

[pid: 63|app: 0|req: 50/54] 10.69.24.84 () {32 vars in 361 bytes} [Thu Jun 17 09:07:28 2021] GET / => generated 8785 bytes in 6 msecs (HTTP/1.1 200) 4 headers in 124 bytes (1 switches on core 2)
Forbidden (Permission denied): /
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/django/core/handlers/exception.py", line 34, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/site-packages/django/core/handlers/base.py", line 115, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/usr/local/lib/python3.5/site-packages/django/core/handlers/base.py", line 113, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python3.5/site-packages/ratelimit/decorators.py", line 29, in _wrapped
    raise Ratelimited()
ratelimit.exceptions.Ratelimited
[pid: 63|app: 0|req: 51/55] 10.69.24.84 () {32 vars in 361 bytes} [Thu Jun 17 09:07:33 2021] GET / => generated 22 bytes in 3 msecs (HTTP/1.1 403) 3 headers in 100 bytes (1 switches on core 3)
Forbidden (Permission denied): /
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/django/core/handlers/exception.py", line 34, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.5/site-packages/django/core/handlers/base.py", line 115, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/usr/local/lib/python3.5/site-packages/django/core/handlers/base.py", line 113, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python3.5/site-packages/ratelimit/decorators.py", line 29, in _wrapped
    raise Ratelimited()
ratelimit.exceptions.Ratelimited

I will keep checking the config as this would save us a lot of time. It is a pain to download images from our private docker registry.

Edit:

I redeployed the pods and now they're up but it seems nginx can't talk to uwgsi:


2021/06/17 14:12:43 [error] 7#7: *1 connect() failed (113: Host is unreachable) while connecting to upstream, client: 10.69.24.84, server: localhost, request: "GET / HTTP/1.1", upstream: "uwsgi://172.30.4.184:3031", host: "172.29.5.219:80"
10.69.24.84 - - [17/Jun/2021:14:12:43 +0000] "GET / HTTP/1.1" 502 173 "-" "kube-probe/1.17" "-"
2021/06/17 14:12:48 [error] 7#7: *4 connect() failed (113: Host is unreachable) while connecting to upstream, client: 10.69.24.84, server: localhost, request: "GET / HTTP/1.1", upstream: "uwsgi://172.30.4.184:3031", host: "172.29.5.219:80"
10.69.24.84 - - [17/Jun/2021:14:12:48 +0000] "GET / HTTP/1.1" 502 173 "-" "kube-probe/1.17" "-"
10.69.24.84 - - [17/Jun/2021:14:12:53 +0000] "GET / HTTP/1.1" 200 8785 "-" "kube-probe/1.17" "-"
10.69.24.84 - - [17/Jun/2021:14:12:58 +0000] "GET / HTTP/1.1" 403 22 "-" "kube-probe/1.17" "-"
10.69.24.84 - - [17/Jun/2021:14:13:03 +0000] "GET / HTTP/1.1" 200 8785 "-" "kube-probe/1.17" "-"
10.69.24.84 - - [17/Jun/2021:14:13:28 +0000] "GET / HTTP/1.1" 403 22 "-" "kube-probe/1.17" "-"
10.69.24.84 - - [17/Jun/2021:14:13:33 +0000] "GET / HTTP/1.1" 403 22 "-" "kube-probe/1.17" "-"

I have that port defined in the helm values:

  # SRegistry image and port exposed by the uwsgi port
  sregistry:
    image: "quay.io.artifactory.quantlab.com/vanessa/sregistry"
    tag: ""
    port: 3031

@ifelsefi
Copy link

ifelsefi commented Jun 17, 2021

So uwsgi dies due to "permission denied /" error but the k8s health check on uwsgi does not know about that. So the container stays up, in bad state, and nginx keeps trying to connect. My nginx health check on the nginx container does detect a bad state when it won't proxy to upstream, as this generates 403 once uwsgi fails, and so that container keeps restarting. So need to fix the permission denied issue on uwsgi with "/".

@vsoch
Copy link
Member

vsoch commented Jun 17, 2021

That is rate limiting enforced by the server -> https://singularityhub.github.io/sregistry/2019/rate-limits/ should be in config.py in settings. You likely would want to change or remove it for this use case. We don't have any official K8 deployment, but it would be good if you get something working to write it up, include configs here, and then either note or have an automated way to disable or change the rate limiting.

@ifelsefi
Copy link

Thank you. I wonder if that would cause the permission denied "/" error? I will push my changes to the fork.

@vsoch
Copy link
Member

vsoch commented Jun 17, 2021

Yes when the rate limit is exceeded it raises a permission denied, 403.

[pid: 63|app: 0|req: 51/55] 10.69.24.84 () {32 vars in 361 bytes} [Thu Jun 17 09:07:33 2021] GET / => generated 22 bytes in 3 msecs (HTTP/1.1 403) 3 headers in 100 bytes (1 switches on core 3)
Forbidden (Permission denied): /

@ifelsefi
Copy link

Ah, this is an option already with the helm chart created by the author.


    VIEW_RATE_LIMIT = {{ .Values.config.viewRateLimit | quote }}  # The rate limit for each view, django-ratelimit, "50 per day per ipaddress)
    VIEW_RATE_LIMIT_BLOCK = (
        {{ .Values.config.viewRateBlock }}  # Given that someone goes over, are they blocked for the period?
    )

Thanks for pointing me in the right direction!

@vsoch
Copy link
Member

vsoch commented Jun 17, 2021

Definitely!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants