Skip to content

Security and Authz

Andrew Azores edited this page Sep 11, 2020 · 31 revisions

General Application Architecture w.r.t Security

NetworkGraph

The application as a whole consists of:

  • ContainerJFR Deployment
    • ContainerJFR Pod
      • ContainerJFR container instance
        • Service + Route
      • (optional) Grafana container instance
        • Service + Route
      • (optional) jfr-datasource container instance
        • Service - no route, only accessible from within the cluster
      • PersistentVolumeClaim for archived recordings
      • Secret for Grafana credentials
  • Operator Pod
    • container-jfr-operator instance, containing various controllers

The Routes on the Application Pod are configured with TLS Termination, so all connections from outside the cluster use HTTPS/WSS using the OpenShift cluster's SSL cert.

When ContainerJFR's "minimal" image is specified then the image does not contain any web client assets, but otherwise there is no difference. When the Operator sees a ContainerJFR Custom Resource created with the minimal: true CR spec field then it selects the minimal ContainerJFR image and deploys it without its accompanying Grafana and jfr-datasource containers, and does not set the environment variables on the ContainerJFR container corresponding to Grafana setup. Again, the ContainerJFR instance itself still handles all HTTPS and WSS communications identically in this case, and all authz protocols are still enforced.

Flow of JFR Data

ContainerJFR connects to other JVM applications within its cluster using remote JMX, using cluster-internal URLs so that no traffic will potentially leave the cluster. ContainerJFR supports connecting to target JVMs with JMX auth credentials enabled ("Basic" style authentication). It is also possible to configure ContainerJFR to trust SSL certificates used by target JVMs by adding the certificate to a volume mounted at /truststore on the ContainerJFR container, which will add the certificate to the SSL trust store used by ContainerJFR. ContainerJFR generates its own self-signed SSL cert at startup time and uses this to secure its own JMX connections, and uses either auto-generated or deploying-user (Operator)-supplied credentials to ensure that only authorized users can connect to ContainerJFR, including when attempting to have it perform introspective operations. When deployed by the Operator, the JMX username is simply "containerjfr" and the password is generated by the Operator, stored in a Secret, and then "mounted" in an environment variable when the ContainerJFR container is created.

ContainerJFR and the associated Operator will only monitor the OpenShift namespace that they are deployed within, and can only connect to target JVMs within this namespace - this is enforced by OpenShift's networking setup. This way, end user administrators or developers can be sure of which set of JVMs they are running which are visible to ContainerJFR and thus which JVMs' data they should be mindful of.

Once ContainerJFR has established a JMX connection to a target application its primary purpose is to enable JFR recordings on the target JVM and expose them to the end user. These recordings can be transferred from the target JVM back to ContainerJFR over the JMX connection. ContainerJFR does this for four purposes:

  1. to generate JMC HTML Rules Reports of the JFR contents, which are generated and kept in-memory, then served to clients over HTTPS
  2. to copy the JFR contents into a file saved in its OpenShift PersistentVolumeClaim ("archive")
  3. to stream a snapshot of the JFR contents over HTTPS to a requesting client's GET request
  4. to upload a snapshot of the JFR contents using HTTPS POST to the jfr-datasource

("archived" JFR copies can also be streamed to clients over HTTPS or POSTed to jfr-datasource, and HTML Rules Reports can also be made of them)

Here, "the client" may refer to an end user's browser when using ContainerJFR's web interface, or may be the end user using a direct WebSocket or HTTP(S) client (ex. websocat or curl), or may be an OpenShift Operator controller acting as an automated client. All of these cases are handled identically by ContainerJFR.

TODO describe how jfr-datasource handles JFR files received by POST request and what information is exposed on API endpoints, even though these are hidden behind OpenShift networking

ContainerJFR Authz Specifics

When deployed in OpenShift, the ContainerJFR container instance detects this scenario and expects clients to provide a Bearer token on all Command Channel (WSS) connections as well as on any HTTPS API connections that can provide information about target applications within the cluster (ie authz are not checked only for requests for things like web-client assets). These tokens are the ones provided by OpenShift OAuth itself, ie. the user's account for that OpenShift instance/cluster. On each HTTPS request, ContainerJFR receives the token and sends its own request to the internal OpenShift OAuth server to validate the token. If OpenShift OAuth validates the token the request is accepted. If OpenShift OAuth does not validate the token, or the user does not provide a token, then the request is rejected with a 401. Likewise, for each new WSS WebSocket connection, ContainerJFR expects the client to provide a token as part of the WebSocket SubProtocol header. This token is then passed to the OpenShift OAuth server in the same way previously described. If the token validation fails, the server will reply with an appropriate closure status code and message after the client sends its first message frame.

The specific method for verifying an OAuth token is to receive the client's provided token and construct a new OpenShift client instance using this token, to perform a request while ContainerJFR masquerades as the actual client. Currently, the request performed is to attempt to list the Routes within the Namespace. This is likely to change in the future to a more robust criterion.

TODO describe non-OpenShift cases

Grafana Authz

The Operator configures the Grafana container to use the default admin username, but the default password is overridden. The Operator generates a random password as below (at the time of writing):

func NewGrafanaSecretForCR(cr *rhjmcv1alpha1.ContainerJFR) *corev1.Secret {
	return &corev1.Secret{
		ObjectMeta: metav1.ObjectMeta{
			Name:      cr.Name + "-grafana-basic",
			Namespace: cr.Namespace,
		},
		StringData: map[string]string{
			"GF_SECURITY_ADMIN_USER":     "admin",
			"GF_SECURITY_ADMIN_PASSWORD": GenPasswd(20),
		},
	}
}

func GenPasswd(length int) string {
	rand.Seed(time.Now().UnixNano())
	chars := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_"
	b := make([]byte, length)
	for i := range b {
		b[i] = chars[rand.Intn(len(chars))]
	}
	return string(b)
}

(ie: [a-zA-Z0-9\-_]{20})

This generated password is stored in a Kubernetes Secret, which is then "mounted" into the Grafana container as an environment variable at startup time. This Secret is also re-read by another controller within the Operator at a later time after Grafana container startup so that the Operator can perform API requests to the Grafana container to configure it with a default dashboard and to add the jfr-datasource datasource definition/URL.

Operator Connections to ContainerJFR

Operator connections to its "child" ContainerJFR instance are solely via WSS, identically as outlined above for all client connections in general. The Operator passes its ServiceAccount API token to ContainerJFR via a WebSocket SubProtocol. ContainerJFR then uses this token to masquerade as the ServiceAccount and verify its permissions within the cluster and namespace.

Once the Operator has obtained information about the target JVM(s) within the namespace, this information is copied into Custom Resources owned by the Operator. This information only includes more basic details such as the names, durations, and states of any recordings active in the target JVM(s), as well as the URL to download the recording. This URL is a direct link to the ContainerJFR Route at a path that allows the recording to be downloaded by the client. This path is also secured using HTTPS Bearer token authentication, so the end user client must supply their own account's token in order to retrieve the recording file. Any information contained within the Custom Resources is secured using OpenShift RBAC policy, similar to other built-in Resource types. Important to note is that the Operator itself does not receive the JFR file itself in whole or in part, so the only information available via the API is information obtained via the command channel, which a user would also be able to view using the web-client or a direct WebSocket connection.

Clone this wiki locally