docs-content/using.html.md.erb

---
title: Using Neo4j for PKS
owner: Partners
---

<strong><%= modified_date %></strong>

This topic describes how to use Neo4j for PKS.

##<a id='using'></a> Using Neo4j for PKS

This section describes how to use Neo4j on PKS.  The material covered here is only
intended to cover the PKS specific details, as a supplement to Neo4j's 
[main user documentation](https://neo4j.com/docs/) which should be relied upon for
additional questions and further guidance.

### <a id='cyphershell'></a> Cypher Shell

The Cypher Shell is the simplest CLI-driven way of querying the database.

To connect to your cluster, you can issue the following command; modify APP_INSTANCE_NAME to be the name you chose when you deployed neo4j.

```
APP_INSTANCE_NAME=my-graph
# Set password as described above in NEO4J_PASSWORD
kubectl run -it --rm cypher-shell \
  --image=causal-cluster:3.5.7 \
  --restart=Never \
  --namespace=default \
  --command -- ./bin/cypher-shell -u neo4j \
  -p "$NEO4J_PASSWORD" \
  -a $APP_INSTANCE_NAME-neo4j.default.svc.cluster.local
```

Please consult standard Neo4j documentation on the many other usage options present, once you have a basic bolt client and cypher shell capability.

### <a id='neo4jbrowser'></a> Neo4j Browser

Neo4j browser is available on port 7474 of any of the hostnames described above.  However, because of the network environment that the cluster is in, hosts in the neo4j cluster advertise themselves with private internal DNS that is not resolvable from outside of the cluster.  See the “Security” and “Limitations” sections in this document for a discussion of the issues there, and for pointers on how you can configure your database with your organization’s DNS entries to enable this access.

Browser access can be enabled with the following steps.

#### <a id='neo4jbrowser-leader'></a> Determine your Cluster Leader

When the cluster forms, one node will be chosen as the leader; in Neo4j's 
topology, only the leader may accept writes.  To determine which host is the 
leader, you can run the cypher-shell command given above, and simply call the 
cypher procedure:  `CALL dbms.cluster.overview();`.  This will provide a 
routing table indicating which host (and pod) is the leader.

If you only want to run read queries, it makes no difference which pod you 
attach to.

#### Forward local ports

First, use [kubectl to forward ports](https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/) on your cluster's leader pod to your localhost.

```
MY_CLUSTER_LEADER_POD=mygraph-neo4j-core-0
kubectl port-forward $MY_CLUSTER_LEADER_POD 7687:7687 7474:7474
```

In this example, my deployment is named "mygraph", and the "core-0" pod is the leader.  We are forwarding port 7474 for browser, and port 7687 for bolt access.

#### <a id='neo4jbrowser-connect'></a> Connect to Browser

Open a browser to `http://localhost:7474` and you will be connecting directly to the cluster leader.

**Important**:  when launching Neo4j Browser, you will see a `:server connect` box, which will be configured to connect to the host's advertised name, which will be an internal DNS name.  Make sure to change this to `localhost`, because the private DNS name is not resolveable from your local machine.

Use your username and password as normal.

Becuase you're connecting to the leader, both reads and writes are possible.  The same approach can be used to attach a browser instance on any local port to any of your pods.

### <a id='users'></a> Adding Users, Changing Passwords

Once connected via a cypher shell, you may use any of the [existing procedures](https://neo4j.com/docs/operations-manual/current/tutorial/role-based-access-control/) 
for user and role management provided as part of neo4j.

### <a id='advanced'></a> Advanced/Custom Configuration of Neo4j

The deploy for PKS is based on Neo4j’s public docker containers.  This means 
that [Neo4j documentation for Docker](https://neo4j.com/docs/operations-manual/current/installation/docker/?ref=pivotal-pks) applies to the configuration of these pods.   In general, to specify an advanced configuration, users should edit the core-statefulset template, and the readreplias-statefulset template to specify environment variables passed to the Docker containers following the guidance of the Docker documentation listed above.

For example, to enable query logging in Neo4j, a user would need to set 
dbms.logs.query.enabled=true inside of the container.   To do that, a user 
would add the following environment variable 
`NEO4J_dbms_logs_query_enabled=true` to the relevant templates; these 
environment variables would be passed through to the docker container, which 
would configure neo4j appropriately.

### <a id='password'></a> Getting your Password

After installation, whether a password was specified or generated, the 
password is stored in a kubernetes secret that is attached to your 
deployment.   Given a deployment named “my-graph”, you can find the password 
as the “neo4j-password” key under the mygraph-neo4j-secrets configuration 
item in Kubernetes.  The password is base64 encoded, and can be recovered as 
plaintext by authorized users with this command:

```
kubectl get secrets $APP_INSTANCE_NAME-neo4j-secrets -o yaml | grep neo4j-password: | sed 's/.*neo4j-password: *//' | base64 --decode
```

This password applies for the base administrative user named “neo4j”.

### <a id='hostnames'></a> Hostnames

All neo4j cluster nodes inside of PKS will get private DNS names that you can use to access them.  Host names will be generated as follows.   `$NAMESPACE` refers to the kubernetes namespace used to deploy neo4j, and `$APP_INSTANCE_NAME` refers to the name it was deployed under.  The variable `$N` refers to the individual cluster node.  For clusters with 3 core nodes, $N could be 0, 1, or 2, and a total of three hostnames will be valid.

- Core Host:  `$APP_INSTANCE_NAME-neo4j-core-$N.$APP_INSTANCE_NAME-neo4j.$NAMESPACE.svc.cluster.local`
- Read Replica Host: `$APP_INSTANCE_NAME-neo4j-replica-$N.neo4j-readreplica.$NAMESPACE.svc.cluster.local`

### <a id='services'></a> Exposed Services

By default, each node will expose:

- HTTP on port 7474
- HTTPS on port 7473
- Bolt on port 7687

Exposed services and port mappings can be configured by referencing neo4j’s docker documentation.   See the advanced configuration section in this document for how to change the way the docker containers in each pod are configured.

### <a id='addrs'></a> Service Address

Additionally, a service address inside of the cluster will be available as follows - to determine your service address, simply substitute $APP_INSTANCE_NAME with the name you deployed neo4j under, and $NAMESPACE with the kubernetes namespace where neo4j resides.

`$APP_INSTANCE_NAME-neo4j.$NAMESPACE.svc.cluster.local`

Any client may connect to this address, as it is a DNS record with multiple entries pointing to the nodes which back the cluster.  For example, bolt+routing clients can use this address to bootstrap their connection into the cluster, subject to the items in the limitations section.

## <a id='BACKUP'></a>  Backing up your Database

Please consult [How to Backup Neo4j Running in Kubernetes](https://medium.com/neo4j/how-to-backup-neo4j-running-in-kubernetes-3697761f229av)

## <a id='RESTORE'></a> Restoring from Backup

Please consult [How to Restore Neo4j Running in Kubernetes](https://medium.com/google-cloud/how-to-restore-neo4j-backups-on-kubernetes-and-gke-6841aa1e3961)

## <a id='SCALING'></a> Scaling

This section describes how to scale Neo4j on PKS up and down according to needs.

### <a id='scaling-planning'></a> Planning

Before scaling a database running on kubernetes, make sure to consult in depth the Neo4j documentation on clustering architecture, and in particular take care to choose carefully between whether you want to add core nodes or read replicas.  Additionally, this planning process should take care to include details of the kubernetes layer, and where the node pools reside.  Adding extra core nodes to protect data with additional redundancy may not provide extra guarantees if all kubernetes nodes are in the same zone, for example.

For many users and use cases, careful planning on initial database sizing is preferable to later attempts to rapidly scale the cluster.

When adding new nodes to a neo4j cluster, upon the node joining the cluster, it will need to replicate the existing data from the other nodes in the cluster.  As a result, this can create a temporary higher load on the remaining nodes as they replicate data to the new member.   In the case of very large databases, this can cause temporary unavailability under heavy loads.  We recommend
that when setting up a scalable instance of Neo4j, you configure pods to restore from a recent
backup set before starting.  Instructions on how to restore are provided in this repo.  In this way,
new pods are mostly caught up before entering the cluster, and the "catch-up" process is minimal both
in terms of time spent and load placed on the rest of the cluster.

Because of the data intensive nature of any database, careful planning before scaling is highly recommended.   Storage allocation for each new node is also needed; as a result, when scaling the database, the kubernetes cluster will create new persistent volume claims and volumes.

Because Neo4j's configuration is different in single-node mode (dbms.mode=SINGLE) you should not
scale a deployment if it was initially set to 1 coreServer.  This will result in multiple independent
databases, not one cluster.

### <a id='scaling-execution'></a>Execution

Neo4j on PKS consists of two StatefulSets in Kubernetes, one for core nodes, and one for replicas.  In configuration, even if you chose zero replicas, you will see a StatefulSet with zero members.

Scaling the database is a matter of scaling one of these stateful sets.  In this example, we will add a read replica to a cluster that has 1 existing replica.

- Choose “Workloads” from the PKS menu
- Select the replica StatefulSet, which will be named $APP_INSTANCE_NAME-neo4j-replica
- Click the “Scale” button on the top toolbar
- You will be given a dialog showing at current 1 replica (in this scenario).   Change it to 2, and click the “Scale” button.
- This kicks off the process of scheduling a new pod into this replica set.   The pod will be pre-configured to connect to the rest of the cluster, so no further action is needed.

Depending on the size of your database and how busy the other members are, it may take considerable time for the cluster topology to show the presence of the new member, as it connects to the cluster and performs catch-up.
Once the new node is caught up, you can execute the cypher query CALL dbms.cluster.overview(); to verify that the new node is operational.

### <a id='warnings'></a> Warnings and Indications

Scaled pods inherit their configuration from their statefulset.  For neo4j, this means that items like configured storage size, hardware limits, and passwords apply to scale up members.

If scaling down, do not scale below three core nodes; this is the minimum necessary to guarantee a properly functioning cluster with data redundancy.   Consult the neo4j clustering documentation for more information.

Neo4j on PKS is configured to never delete the backing data drive, to lessen the chance of inadvertent data destruction.   If you scale up and then later scale down, this may orphan a disk, which you may want to manually delete at a later date.

## <a id='security'></a> Security

For security reasons, we have not enabled access to the database cluster from outside of Kubernetes 
by default, instead choosing to leave this to users to configure appropriate network access policies 
for their usage.  If this is desired, we recommend exposing each pod individually, and not using a 
load balancer service across all pods, because neo4j’s cluster architecture differentiates 
between the leader and followers; not all pods may be treated interchangeably as a result.