Skip to content
This repository has been archived by the owner on Jan 19, 2024. It is now read-only.

Add support for nginx-ingress #42

Open
JakeNeyer opened this issue Jul 8, 2020 · 5 comments
Open

Add support for nginx-ingress #42

JakeNeyer opened this issue Jul 8, 2020 · 5 comments
Labels
enhancement New feature or request

Comments

@JakeNeyer
Copy link

This issue has been outlined fairly well here: https://community.neo4j.com/t/cannot-connect-to-cluster-using-k8s-ingress/15476/7

The intent is to be able to expose the services via an nginx-ingress wherein connections to bolt can be made from outside the cluster. The current external access documentation here: https://github.com/neo4j-contrib/neo4j-helm/blob/master/tools/external-exposure/EXTERNAL-EXPOSURE.md has a few short-comings:

  • Additional static IP address are required instead of using an existing load balancer that nginx-ingress manages
  • Changing A records for the static IPs could become a nuisance in environments where resources are created and destroyed frequently
  • In my particular case, I am using an internal ELB with nginx-ingress. AWS does not allow creating private static IPs without the use of an ENI AFAIK. This would make this solution only possible with public IP addresses
@moxious
Copy link
Contributor

moxious commented Jul 8, 2020

I have to admit I'm not entirely following the setup that he's describing. I'm not opposed to this as a request, but I need to know how it works.

OK, so he has an LB externally that traffic gets in through, got it. In that thread, the ingress ends up pointing to a service he created, which targets all 3 core pods. If it's a single instance deploy I can see how this is going to work and be successful. If it's a cluster, I don't understand how this could possibly work. Because in order for clustering to work:

  • Each core member has to know what its separate distinct external address is (the ingress DNS name, whatever)
  • Traffic for each of the ingress DNS names needs to be routed to a particular pod, not just any pod the service is attached to.

The poster in that thread said he got it to work, but based on what's there, it wouldn't work -- he must have made subsequent changes he didn't post which made it work. Any ideas?

FWIW -- in the directions that are already in the repo, you don't really have to have static IP addresses. That was just a stand-in for "an external addressable name that we can use for advertising/routing". You could use a different private internal DNS and the general approach would still work. But notably -- in that other approach, you'll notice it's 3 LBs, one per pod - because we need to route traffic to each pod individually. Similarly, there's no reason you couldn't use an nginx-ingress LB instead of the ones suggested in those docs. Part of what I'm trying to figure out here is if the nginx-ingress approach is really going to be different in the end anyway.

@gwvandesteeg
Copy link
Contributor

gwvandesteeg commented Jul 12, 2020

Sofar from what I've been able to see, none of the material documented in exposing the service to material outside of the kubernetes cluster will work on AWS if you are only intending to use it from inside your VPC. It might work if you want to expose it to the world using Elastic Network Interfaces with publicly routed IPs.
But if your intent is to make it available to the same VPC or another VPC (via a peering connection or transit gateway) where the traffic is routed internally only and never travels across the public internet, it is not going to work.
I'd love to see details from someone who does have it working using either nginx-ingress, or internal load balancers created using kubernetes resources.
The ideal situation is where you stick the database behind a standard load balancer and connect to that single point and the routing is dealt with for you by whatever is on the other end of the load balancer, that'd make the deployment simplest to work with (a proxy/distribution pod that deals with it for example, perhaps run as a sidecar with the nodes themselves).

@moxious
Copy link
Contributor

moxious commented Jul 13, 2020

none of the material documented in exposing the service to material outside of the kubernetes cluster will work on AWS if you are only intending to use it from inside your VPC

@gwvandesteeg I think you're right about this, but this is also a different ask than I've heard from most people. See the thing is, a lot of people are trying to enable the use of things like Bloom & Browser, which implies a connection from their laptop all the way through to Neo4j. If you're using Neo4j entirely within a VPC well then yes you still have to expose it out to the VPC, but probably you're not using browser in that setup (unless it's more exotic and you have a workstation running inside the VPC that you're remote desktop'ing into)

The ideal situation is where you stick the database behind a standard load balancer and connect to that single point and the routing is dealt with for you by whatever is on the other end of the load balancer, that'd make the deployment simplest to work with (a proxy/distribution pod that deals with it for example, perhaps run as a sidecar with the nodes themselves).

I agree that's the ideal situation, but I want to state clearly that as far as my understanding goes, it isn't possible. Neo4j uses a smart client routing protocol that means that LBs in the middle routing traffic actively break the way the Neo4j clients function.

@gwvandesteeg
Copy link
Contributor

none of the material documented in exposing the service to material outside of the kubernetes cluster will work on AWS if you are only intending to use it from inside your VPC

@gwvandesteeg I think you're right about this, but this is also a different ask than I've heard from most people. See the thing is, a lot of people are trying to enable the use of things like Bloom & Browser, which implies a connection from their laptop all the way through to Neo4j. If you're using Neo4j entirely within a VPC well then yes you still have to expose it out to the VPC, but probably you're not using browser in that setup (unless it's more exotic and you have a workstation running inside the VPC that you're remote desktop'ing into)

Or you're providing connectivity to the VPC via VPN and allowing your local users to connect with Bloom/Browser via that VPN connectivity.

The ideal situation is where you stick the database behind a standard load balancer and connect to that single point and the routing is dealt with for you by whatever is on the other end of the load balancer, that'd make the deployment simplest to work with (a proxy/distribution pod that deals with it for example, perhaps run as a sidecar with the nodes themselves).

I agree that's the ideal situation, but I want to state clearly that as far as my understanding goes, it isn't possible. Neo4j uses a smart client routing protocol that means that LBs in the middle routing traffic actively break the way the Neo4j clients function.

Yeah, i was thinking about an intermediate proxy application between the LB and the Neo4J instance, connections to the LB connect to the proxy application which you can spin up as many instances you want of, and then the proxy communicates with the Neo4J servers and deals with the smart routing component. So you'd have:
client -> LB -> proxy -> neo4j

I'm curious to see if this could be done, or if a plugin for envoy-proxy could be created to make it work.

@moxious
Copy link
Contributor

moxious commented Jul 14, 2020

Yeah, i was thinking about an intermediate proxy application between the LB and the Neo4J instance, connections to the LB connect to the proxy application which you can spin up as many instances you want of, and then the proxy communicates with the Neo4J servers and deals with the smart routing component. So you'd have:
client -> LB -> proxy -> neo4j

I know from internal Neo4j work that this can be done. It's complicated but it can work. I don't mean to be a jerk about this but I can say that we won't support this kind of an approach in this repo unless we have a separate, stable, open source proxy app that's available and agreed that it works. The goal of what we're trying to do here is to expose as close to the regular product surface of Neo4j as possible, but with a kubernetes flavor. There are a bunch of application scenarios where if you can add new software components into the mix that aren't neo4j and also aren't kubernetes, then you can enable some cool usage patterns. I don't mean to discount the value of doing those things - but that starts down the path of "kubernetes applications built on neo4j" rather than "neo4j running in kubernetes". The distinction is important, because it speaks to what I'd need to be able to help & support on. Extra app & proxy components would need to be downstream software.

I see the issue here -- but bottom line is I don't see a clear path forward for how to support this. I'm open to a PR if you'd like to try one.

@moxious moxious added the enhancement New feature or request label Aug 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants