Skip to content
This repository has been archived by the owner on Jan 27, 2021. It is now read-only.

Osiris tries to activate already active service #63

Open
tenitski opened this issue Nov 28, 2019 · 12 comments
Open

Osiris tries to activate already active service #63

tenitski opened this issue Nov 28, 2019 · 12 comments

Comments

@tenitski
Copy link

Bug:

Activator works and scales up the deployment however it looks like Osiris does not register the fact that the deployment is now running and keeps attempting to scale up.

This is what is logged by activator for each request:

I1128 00:55:21.673849       1 request_handling.go:10] Request received for for host MY_DOMAIN_HERE
I1128 00:55:21.673865       1 request_handling.go:19] Deployment MY_SERVICE_HERE in namespace MY_NAMESPACE_HERE may require activation
I1128 00:55:21.673872       1 request_handling.go:51] Found NO activation in-progress for deployment MY_SERVICE_HERE in namespace MY_NAMESPACE_HERE
I1128 00:55:21.679078       1 activating.go:29] Activating deployment MY_SERVICE_HERE in namespace MY_NAMESPACE_HERE
I1128 00:55:21.682330       1 deployment_activation.go:116] App pod with ip 172.27.34.162 is in service
@tenitski
Copy link
Author

Oh I think I got it - it is single service to single hostname mapping.
If I have many services behind an internal router with ingress pointing to that router service it wont work...

@tenitski
Copy link
Author

tenitski commented Nov 28, 2019

And if I try to use internal host names like SERVICE_NAME.NAMESPACE_NAME.svc.cluster.local it does not work as Osiris seem to watch only ingress

@tenitski
Copy link
Author

tenitski commented Nov 28, 2019

So I guess the question is - is it possible to get Osiris to work with ClusterIP services using internal DNS names?

@tenitski
Copy link
Author

While internal hostnames are indexed

appsByHost[svcFullDNSName] = app
Osiris seem to be using external hostname to pull service from the index

@krancour
Copy link
Contributor

As you've discovered by now, the activator has a map where the keys are all the different hostnames (DNS) and IPs by which a service might be addressed to values that are corresponding deployments. Have a look at all the config options that are covered in the README. There are a few different annotations that let you explicitly add hostnames to the map that the activator cannot infer on its own.

@tenitski
Copy link
Author

Thanks for getting back to me. I went through all the options 10 times and read half of the source code :). It seems that my problem is that Osiris is not using external hostname when looking up for services in the index:

Say the website is app.example.com
Internally I have service router.example.svc.cluster.local which would redirect requests to microservice permissions.example.svc.cluster.local

In the logs I see only

I1128 03:02:16.293979       1 request_handling.go:10] Request received for for host app.example.com
E1128 03:02:16.294004       1 proxy.go:97] Error executing start proxy callback for host "app.example.com": No deployment found for host app.example.com

It does not try to look up a service for permissions.example.svc.cluster.local

@tenitski
Copy link
Author

Can you please point out a place where a hostname of a service for a processed request is set?

@krancour
Copy link
Contributor

I think there's a layer of indirection in your example that needs to be explained to me in order for me to help you effectively. You seem to be using some router component to direct traffic. What is Osiris-enabled here? The router or the target? And what is the original request you are making?

@tenitski
Copy link
Author

So the request flows is:

  • Ingress Controller (https://app.example.com/permissions)
    • Ingress (app.example.com)
      • Kubernetes Service linked to a router deployment (router.example.svc.cluster.local, router has a mapping between different paths and microservices which serve these paths)
        • Kubernetes Service linked to a microservice deployment (Osiris enabled, permissions.example.svc.cluster.local)

Osiris logs requests associated with app.example.com and says that no deployment found for this host. However it does not log requests related to router.example.svc.cluster.local or permissions.example.svc.cluster.local.

This is the annotation I use on permissions service:

metadata:
  annotations:
    osiris.deislabs.io/enabled: "true"
    osiris.deislabs.io/deployment: permissions
    osiris.deislabs.io/ingressHostname: "permissions.example.svc.cluster.local"

This is the annotation on permissions deployment:

metadata:
  name: permissions
  annotations:
    osiris.deislabs.io/enabled: "true"
...
spec:
  template:
    metadata:
      annotations:
        osiris.deislabs.io/enabled: "true"
...

@krancour
Copy link
Contributor

That's a complex bit of indirection... out of curiosity why have a "router" behind an ingress controller? Ostensibly, an ingress controller is a router of sorts. Anyway... let's take the router out of the equation for a moment-- just for the sake of simplifying what I'm about to say-- fewer hops is easier to understand, right?

So pretend you have just your ingress controller and then your permissions service. Any request to app.example.com still looks like a request for app.example.com when it hits the activator. i.e. The host header still says app.example.com. That isn't changed in the request's traversal of the ingress controller.

So... the request hits the activator looking like a request for app.example.com, but per the configuration you posted, that is not a hostname that the activator would know anything about. How would it?

It seems here that you have perhaps misused the ingressHostname annotation, as you have given it a value that you should not need to give it-- a value that the activator can infer all on its own should be mapped to the permissions deployment. If, however, you use that annotation to tell the activator about app.example.com, you'd be adding new information to the activator that would help it match the request with the app.example.com host header.

Now... as for why this flow isn't totally erroring and seems, from what I see in the logs you posted, to be making an earnest attempt to activate, that seems as if it could possibly be a bug. Definitely, the activator shouldn't attempt to do an activation for some deployment it cannot identify and if that is happening, it's a mistake. I'd have to dig into the code more to see if that's actually going on.

There's one other thing lurking in here...

I suspect that you are doing (or intend to do) some path-pased routing. e.g. routing not only on hostname, but also on paths. Is that so? This is not supported (yet?) so that might also be some kind of factor here.

@tenitski
Copy link
Author

tenitski commented Dec 9, 2019

We use router behind ingress as there is a dozen of microservices with complex routing rules: path based, HTTP methods, feature flags, etc. Router handles it. Also these microservices make calls to each other as part of processing the original request passed by ingress. These calls also go via the router.

So yes, we do have path based routing, however as it is used by a router to resolve path to an internal hostname like permissions.example.svc.cluster.local adding path support to Osiris would not solve our problem.

I'm still not sure how the activator works:

  • it builds a map of all possible ways to connect to a deployment (internal hostname, ip, ingress or LB if provided)
  • however, I can't see it using anything other that the hostname of the original request to look up a deployment in that map

Does activator only listen to the requests coming from outside of the cluster? This would explain why it does not mention any requests to the services which are only available internally.

@krancour
Copy link
Contributor

Does activator only listen to the requests coming from outside of the cluster? This would explain why it does not mention any requests to the services which are only available internally.

If things are configured properly, any Osiris-enabled service that has no endpoints in service (i.e. is scaled to zero) gets activator endpoints automatically added. So any traffic that follows through such a service, regardless of where it came from or how it got there, will like it to the activator. The main question really is one of whether the activator will know what you do with the request and that's going to end up being a matter if 1. configuration and 2. what the host header (or SNI) says.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants