Skip to content

Server runtime documentation

Jonathan Bond edited this page Sep 13, 2013 · 7 revisions

Overview

This page documents the SoftLayer-NetflixOSS changes required to the core runtime components that Adrian Cockcroft calls "NetflixOSS Instance Libraries" in [his presentation on SoftLayer, IBM Java and WebSphere Liberty Profile (http://www.slideshare.net/adrianco/netflixoss-meetup) (See slide 21).

IBM Java

For the most part, using IBM Java under NetflixOSS (and under WebSphere Liberty Profile) was a straight forward exercise due to the JVM language and Java standards upon which NetflixOSS is based. We did run into some small issues in NetflixOSS' use of Groovy and differences between Open Java and IBM Java Groovy. For instance libraries, the only real difference between Open Java and IBM Java is memory tuning. Extensive documentation is available on tuning the IBM Java online.

WebSphere Liberty Profile

We will go into each instance library run on top of WebSphere Liberty Profile (WLP), but this section will cover WLP in general. WLP is an IBM supported lightweight application server that we believe is well fit to cloud deployment scenarios. Why? For two major reasons. First, the configuration is simple and works well with elastic/ephemeral deployments as there are no hostnames/ips/etc coded into the simple server.xml. Second, the server is designed with a "just enough" design which means only features you select get loaded at server boot time (vs. a full just in case JEE profile). The second point means applications on new instances start in seconds and have memory footprints that don't get in the way of the true application memory needs. You can read more on WLP on wasdev or from the Chief Architect Ian Robinson.

Archaius

For bootstrapping Archaius we used the supported -D archaius.deployment.applicationId and archaius.deployment.environment properties. We pass these to the WLP server via a jvm.properties file as documented in github.

Karyon (and Governator)

The Karyon "container" component is for bootstrapping, a consistent management console, automatic Eureka registration, and Archaius property configuration. We used the standard Karyon Guice context listener to have our applications bootstrap Karyon. We configured this in web.xml as shown in github.

A fix was needed here to have the Karyon console available. Karyon depends on Jersey and it's classpath scanning for REST resources. There are parts of this classpath scanning that look to see if a "module" being scanned is a directory or a compressed archive (like a war or zip or jar). In order to understand the difference in Java it does a ClassLoader.getResource and then it looks at the returned URL's protocol. In WLP (and other application servers) this protocol comes back unique to the server. In WLP instead of returning "jar" it returns "wsjar". WLP uses this information internally to understand the jar has features not available in plain jars. Unfortunately this breaks Jersey scanning as it doesn't know how to handle "wsjar"'s. In order to work around this, when we deploy the war, we deploy it as a directory vs. a war and for the two specific karyon jars under WEB-INF/lib we similarly deploy them as directories (and not jars). This is documented as an issue in github. Long term it would be good to fix the Jersey classpath scanner or use the standard jaxrs implementation in WLP (more on that later).

Similarly, Governator has the same issue with classpath scanning. You will see a simple but effective fix we applied to Governator when running in WLP in github.

Eureka client

The Eureka client code used to register web applications and micro services into the Eureka server was, by default, registering non-fully-qualified hostnames of the instance running the service. This caused issues in SoftLayer as this hostname wasn't DNS locatable from all other instances in SoftLayer. We made a change to the Eureka client to register by IP address instead as documented in github. This is something we'd like to discuss in the community. Even on other clouds (AWS EC2), we've seen this cause issues and it would seem like using the IP address might be a better approach.

Hystrix

We used Hystrix to demonstrate the value of fast failure between the main web application and the backend authentication service. We used the threadpool based isolation strategy of Hystrix. We could have used the semaphore based strategy, but wanted to have full protection regardless of networking timeouts. This can cause issues in WLP depending on the needs for context propagation (transaction/security), etc. Also, it makes the isolation thread pool less controlled and therefore visible (for problem determination/application lifecycle control) to the WLP server. At this point, this isn't a recommended configuration under WLP unless you understand the risks implied. In the past, the JEE world has dealt with similar in app server asynchronous context propagation via Async Beans.

There is an excellent thread started on the Hystrix user group asking for support of Hystrix is commercial application servers. Ben does a great job of not only pointing out these issues, but also suggests a starting point where we could choose to integrate better context propagation. We think this is a great starting point for providing similar support as we have done in the past with Async Beans.

Ribbon

We used Ribbon with the eureka client to do cross application REST communication. You can see the calls within our Hystrix commands in github. Ribbon depends on Jersey as a JAX-RS implementation. In WLP our standard jaxrs-1.1 support is based upon Apache Wink. If you try to start a server that is running application REST services with the standard WLP jaxrs-1.1 support and use Ribbon you will get class collisions between the implementations (both at the jaxrs level and at the underlying Jackson JSON level). For now, we have worked around this by using Jersey in our application. You can do this in WLP, by not turning on the jaxrs-1.1 feature and instead enabling basic servlet-3.0 functionality as documented in our server.xml on github. Also, you have to use the non-JEE standard way to initialize Jersy as documented in our web.xml on github.com.

Long term, it would be interesting to consider if we should rework Ribbon to be based on Apache Wink. In some non-official back of the envelope testing it seemed like Apache Wink was faster with the Acme Air benchmark. Also, from a WLP perspective, we need to look at if supporting both jaxrs implementations is viable. Either way, we've already looked at why the implementations collide and are thinking of things that might make that easier regardless of which library someone uses.

Baked AMI's

We are not yet using Aminator. However, we have followed an approach of pre-baking our images of the applications built on top of WLP. Once we have a workable code level, we take the gradle built war and install it onto WLP using a based "WLP instance image" and the servler.xml in github. We then capture this as an image and later deploy it via Asgard.

Other areas not yet ported

We have considered, but not yet (due to priority) consider Blitz4j, Servo, and Atlas for monitoring. We have started to look at RxJava and you could expect an eventual Acme Air implementation based on this style of programming. We have work complete in a branch on replacing the back end data store with Cassandra for evaluation of the Astyanax client. We would like to expand that branch to consider EVcache as well.