Add support for nginx-ingress #42

JakeNeyer · 2020-07-08T19:03:57Z

This issue has been outlined fairly well here: https://community.neo4j.com/t/cannot-connect-to-cluster-using-k8s-ingress/15476/7

The intent is to be able to expose the services via an nginx-ingress wherein connections to bolt can be made from outside the cluster. The current external access documentation here: https://github.com/neo4j-contrib/neo4j-helm/blob/master/tools/external-exposure/EXTERNAL-EXPOSURE.md has a few short-comings:

Additional static IP address are required instead of using an existing load balancer that nginx-ingress manages
Changing A records for the static IPs could become a nuisance in environments where resources are created and destroyed frequently
In my particular case, I am using an internal ELB with nginx-ingress. AWS does not allow creating private static IPs without the use of an ENI AFAIK. This would make this solution only possible with public IP addresses

moxious · 2020-07-08T21:55:27Z

I have to admit I'm not entirely following the setup that he's describing. I'm not opposed to this as a request, but I need to know how it works.

OK, so he has an LB externally that traffic gets in through, got it. In that thread, the ingress ends up pointing to a service he created, which targets all 3 core pods. If it's a single instance deploy I can see how this is going to work and be successful. If it's a cluster, I don't understand how this could possibly work. Because in order for clustering to work:

Each core member has to know what its separate distinct external address is (the ingress DNS name, whatever)
Traffic for each of the ingress DNS names needs to be routed to a particular pod, not just any pod the service is attached to.

The poster in that thread said he got it to work, but based on what's there, it wouldn't work -- he must have made subsequent changes he didn't post which made it work. Any ideas?

FWIW -- in the directions that are already in the repo, you don't really have to have static IP addresses. That was just a stand-in for "an external addressable name that we can use for advertising/routing". You could use a different private internal DNS and the general approach would still work. But notably -- in that other approach, you'll notice it's 3 LBs, one per pod - because we need to route traffic to each pod individually. Similarly, there's no reason you couldn't use an nginx-ingress LB instead of the ones suggested in those docs. Part of what I'm trying to figure out here is if the nginx-ingress approach is really going to be different in the end anyway.

gwvandesteeg · 2020-07-12T12:14:39Z

Sofar from what I've been able to see, none of the material documented in exposing the service to material outside of the kubernetes cluster will work on AWS if you are only intending to use it from inside your VPC. It might work if you want to expose it to the world using Elastic Network Interfaces with publicly routed IPs.
But if your intent is to make it available to the same VPC or another VPC (via a peering connection or transit gateway) where the traffic is routed internally only and never travels across the public internet, it is not going to work.
I'd love to see details from someone who does have it working using either nginx-ingress, or internal load balancers created using kubernetes resources.
The ideal situation is where you stick the database behind a standard load balancer and connect to that single point and the routing is dealt with for you by whatever is on the other end of the load balancer, that'd make the deployment simplest to work with (a proxy/distribution pod that deals with it for example, perhaps run as a sidecar with the nodes themselves).

moxious · 2020-07-13T14:13:18Z

none of the material documented in exposing the service to material outside of the kubernetes cluster will work on AWS if you are only intending to use it from inside your VPC

@gwvandesteeg I think you're right about this, but this is also a different ask than I've heard from most people. See the thing is, a lot of people are trying to enable the use of things like Bloom & Browser, which implies a connection from their laptop all the way through to Neo4j. If you're using Neo4j entirely within a VPC well then yes you still have to expose it out to the VPC, but probably you're not using browser in that setup (unless it's more exotic and you have a workstation running inside the VPC that you're remote desktop'ing into)

The ideal situation is where you stick the database behind a standard load balancer and connect to that single point and the routing is dealt with for you by whatever is on the other end of the load balancer, that'd make the deployment simplest to work with (a proxy/distribution pod that deals with it for example, perhaps run as a sidecar with the nodes themselves).

I agree that's the ideal situation, but I want to state clearly that as far as my understanding goes, it isn't possible. Neo4j uses a smart client routing protocol that means that LBs in the middle routing traffic actively break the way the Neo4j clients function.

gwvandesteeg · 2020-07-13T23:20:13Z

none of the material documented in exposing the service to material outside of the kubernetes cluster will work on AWS if you are only intending to use it from inside your VPC

@gwvandesteeg I think you're right about this, but this is also a different ask than I've heard from most people. See the thing is, a lot of people are trying to enable the use of things like Bloom & Browser, which implies a connection from their laptop all the way through to Neo4j. If you're using Neo4j entirely within a VPC well then yes you still have to expose it out to the VPC, but probably you're not using browser in that setup (unless it's more exotic and you have a workstation running inside the VPC that you're remote desktop'ing into)

Or you're providing connectivity to the VPC via VPN and allowing your local users to connect with Bloom/Browser via that VPN connectivity.

The ideal situation is where you stick the database behind a standard load balancer and connect to that single point and the routing is dealt with for you by whatever is on the other end of the load balancer, that'd make the deployment simplest to work with (a proxy/distribution pod that deals with it for example, perhaps run as a sidecar with the nodes themselves).

I agree that's the ideal situation, but I want to state clearly that as far as my understanding goes, it isn't possible. Neo4j uses a smart client routing protocol that means that LBs in the middle routing traffic actively break the way the Neo4j clients function.

Yeah, i was thinking about an intermediate proxy application between the LB and the Neo4J instance, connections to the LB connect to the proxy application which you can spin up as many instances you want of, and then the proxy communicates with the Neo4J servers and deals with the smart routing component. So you'd have:
client -> LB -> proxy -> neo4j

I'm curious to see if this could be done, or if a plugin for envoy-proxy could be created to make it work.

moxious · 2020-07-14T11:42:21Z

Yeah, i was thinking about an intermediate proxy application between the LB and the Neo4J instance, connections to the LB connect to the proxy application which you can spin up as many instances you want of, and then the proxy communicates with the Neo4J servers and deals with the smart routing component. So you'd have:
client -> LB -> proxy -> neo4j

I know from internal Neo4j work that this can be done. It's complicated but it can work. I don't mean to be a jerk about this but I can say that we won't support this kind of an approach in this repo unless we have a separate, stable, open source proxy app that's available and agreed that it works. The goal of what we're trying to do here is to expose as close to the regular product surface of Neo4j as possible, but with a kubernetes flavor. There are a bunch of application scenarios where if you can add new software components into the mix that aren't neo4j and also aren't kubernetes, then you can enable some cool usage patterns. I don't mean to discount the value of doing those things - but that starts down the path of "kubernetes applications built on neo4j" rather than "neo4j running in kubernetes". The distinction is important, because it speaks to what I'd need to be able to help & support on. Extra app & proxy components would need to be downstream software.

I see the issue here -- but bottom line is I don't see a clear path forward for how to support this. I'm open to a PR if you'd like to try one.

moxious added the enhancement New feature or request label Aug 7, 2020

moxious mentioned this issue Oct 12, 2020

Backup/Restore to/from AWS S3 bucket #100

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for nginx-ingress #42

Add support for nginx-ingress #42

JakeNeyer commented Jul 8, 2020

moxious commented Jul 8, 2020 •

edited

Loading

gwvandesteeg commented Jul 12, 2020 •

edited

Loading

moxious commented Jul 13, 2020

gwvandesteeg commented Jul 13, 2020

moxious commented Jul 14, 2020

Add support for nginx-ingress #42

Add support for nginx-ingress #42

Comments

JakeNeyer commented Jul 8, 2020

moxious commented Jul 8, 2020 • edited Loading

gwvandesteeg commented Jul 12, 2020 • edited Loading

moxious commented Jul 13, 2020

gwvandesteeg commented Jul 13, 2020

moxious commented Jul 14, 2020

moxious commented Jul 8, 2020 •

edited

Loading

gwvandesteeg commented Jul 12, 2020 •

edited

Loading