The purpose of this repository aims to provide a hands-on practice for my series articles build a modern scalable system. With this PoC(proof-of-concept) sample, we can learn how to build and run a microservices architecture
(MSA) application based on the ecosystem comprises of the excellent technologies offering by Spring-Cloud
, Hashicorp
, Elastic
, Netflix
and etc.
├── modules
│ ├── add.svc // add operator service, which will be called by calculator-ui
│ │ ├── build.gradle
│ │ ├── out
│ │ └── src
│ ├── api.gateway // api.gateway based on Zuul
│ │ ├── build.gradle
│ │ ├── out
│ │ └── src
│ ├── calculator.ui // front-end service(edge service), which will be exposed on public cloud
│ │ ├── build.gradle
│ │ ├── out
│ │ └── src
│ ├── ifw.lib // A prototype simple library impl to demonstrate an AOP based invocation framework with annotated QoS supports (just for demo, not production qualified)
│ │ ├── build.gradle
│ │ ├── out
│ │ └── src
│ ├── springboot.admin // spring-boot admin service
│ │ ├── build.gradle
│ │ ├── out
│ │ └── src
│ ├── sub.svc // subtract operator service, which will be called by calculator-ui
│ │ ├── build.gradle
│ │ ├── out
│ │ └── src
│ ├── turbine.aggr // turbine aggregation service
│ │ ├── build.gradle
│ │ ├── out
│ │ └── src
│ └── zipkin.server // Zipkin server(with HTTP and kafka collectors, in-memory span storage)
│ ├── build.gradle
│ ├── out
│ └── src
MSA Capability | Feature | Tech-Deps/Vendors | Related Module |
---|---|---|---|
Dev | APIDoc | Swagger | calculator.ui, add.svc, sub.svc |
TDD | Consumer Driven Contracts | calculator.ui, add.svc, sub.svc | |
AOP | Spring AOP | ifw.lib(customized aop annotation), calculator.ui | |
Evn-Spec Config | Spring-profiles | All | |
Intercommunication | Edge-Servie(API) Gateway | Zuul | api.gateway |
Service Discovery/Registry | Consul | All | |
Load Balance | Consul, Spring, Robbon | api.gateway, calculator.ui | |
Event-Driven/Messaging | Kafka(transactional message) | ifw.lib(counter interceptor), calculator.ui | |
Circuit Breakers | Spring, Hystrix | api.gateway, calculator.ui | |
Distributed Tracing | OpenTracing, Kafka, Zipkin | calculator.ui(HTTP), add.svc(HTTP), sub.svc(HTTP), api.gateway(Kafka), zipkin.server | |
Service Orchestration | Service Scheduling | Nomad | Hybrid deployables support, standalone, container. |
Centralized Logging | Nomad(Sidecar Task), Filebeat, Kafka, ELK | calculator.ui(local file), add.svc(local file), sub.svc(local file), api.gateway(Kafka) | |
Operation | Service Admin | Springboot-Admin, Spring-acutator | All |
Service allocation status | Nomad, Hashi-ui | All | |
Distributed Tracing Reporter | Zipkin-ui | All | |
Centralized Logging Query | ELK, Kibana | All | |
Overall System(Linux) profiling | nmon | All | |
Process full stack(java, c, kernel) profiling | perf_event, framegraph | All | |
Java heap analysis | IBM/Oracle JDK | ||
Java thread(coredump) analysis | IBM/Oracle JDK | ||
Web Perf Testing | wrk |
Notable:
CI/CD refer to the combined practices of
continuous integration
andcontinuous delivery
In order to keep the PoC sample as simple as possible, I didn't involve CI/CD features here.
cd <CONSUL_HOME>
./bin/consul agent -dev
cd <KAFKA_1.0_HOME>
./bin/zookeeper-server-start.sh ./config/zookeeper.properties
./bin/kafka-server-start.sh ./config/server.properties
In this sample, the api-gateway
distributed tracing will be reported to Kafka topic zipkin
, we can monitor that topic and verify the result:
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic zipkin
Either start Zipkin by Zipkin binary
cd <ZIPKIN_SRC_HOME>
./mvnw -DskipTests --also-make -pl zipkin-server clean install
KAFKA_BOOTSTRAP_SERVERS=localhost:9092 java -jar ./zipkin-server/target/zipkin-server-2.11.2-SNAPSHOT-exec.jar
Please refer to zipkin native kafka collector auto-configure support for more details
or lanuch the Zipkin server via an embedded spring boot module
cd modules/zipkin.server
spring bootRun
cd modules/springboot.admin
gradle bootRun
cd modules/springboot.admin
gradle bootRun
cd modules/api.gateway
gradle bootRun
cd modules/add.svc
gradle bootRun
cd modules/sub.svc
gradle bootRun
cd modules/calculator.ui
gradle bootRun
curl -X POST "http://localhost:2809/calculator-ui/api/v1/compute" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"expression\": \"1+1+2+3+4+5+6-1-2-3-4-5-6+9+8-4-5\"}"
calculator.lua:
wrk.method = "POST"
wrk.headers["Content-Type"] = "application/json"
wrk.body = "{ \"expression\": \"1+1+2+3+4+5+6-1-2-3-4-5-6+9+8-4-5\"}"
wrk -t 200 -c 200 -s ./scripts/calculator.lua -d 60s --latency http://localhost:2809/calculator-ui/api/v1/compute
The goal of aspect oriented programming (AOP) is to provide a general mechanism for weaving additional logic into program code without making changes to the code itself, resulting in very loose coupling between the additional logic and the actual business logic. //TODO
Please refer to Hystrix for more details about how Hystrix works.
- Open browser and input url http://localhost:9090/hystrix
- Input dashboard stream url as below:
[1] http://localhost:9090/turbine.stream?cluster=api-gateway
[2] http://localhost:9090/turbine.stream?cluster=calculator-ui
Microservices architecture faces certain challenges that are not common for monolithic systems – one of them is traceability of end-to-end transaction that is executed in distributed environment and that can potentially involve invocation of large number of diverse applications and services. In case of performance degradation or fatal exception thrown that is experienced in end-to-end solution, identication of bottleneck component/failure point become more and more diffcult. Distributed tracking
is used to resolve the problems in this area.
In this PoC sample, I'm using
opentracing-spring-cloud
instead ofspring-cloud-sleuth
, as it will bring us more benifits for cross porgramming languages environment, as well the switching from Zipkin to others, e.g: Jaeger.
opentracing-spring-cloud
contains auto-configurations which can be enabled to instruments and trace following Spring Boot projects:
- Spring Web (RestControllers, RestTemplates, WebAsyncTask)
- @Async, @Scheduled, Executors
- WebSocket STOMP
- Feign, HystrixFeign
- Hystrix
- JMS
- JDBC
- Mongo
- Zuul
- RxJava
- Standard logging - logs are added to active span
- Spring Messaging - trace messages being sent through Messaging Channels
- RabbitMQ
Below is the description from zipkin architecture doc
Here's an example sequence of http tracing where user code calls the resource /foo. This results in a single span, sent asynchronously to Zipkin after user code receives the http response.
┌─────────────┐ ┌───────────────────────┐ ┌─────────────┐ ┌──────────────────┐
│ User Code │ │ Trace Instrumentation │ │ Http Client │ │ Zipkin Collector │
└─────────────┘ └───────────────────────┘ └─────────────┘ └──────────────────┘
│ │ │ │
┌─────────┐
│ ──┤GET /foo ├─▶ │ ────┐ │ │
└─────────┘ │ record tags
│ │ ◀───┘ │ │
────┐
│ │ │ add trace headers │ │
◀───┘
│ │ ────┐ │ │
│ record timestamp
│ │ ◀───┘ │ │
┌─────────────────┐
│ │ ──┤GET /foo ├─▶ │ │
│X-B3-TraceId: aa │ ────┐
│ │ │X-B3-SpanId: 6b │ │ │ │
└─────────────────┘ │ invoke
│ │ │ │ request │
│
│ │ │ │ │
┌────────┐ ◀───┘
│ │ ◀─────┤200 OK ├─────── │ │
────┐ └────────┘
│ │ │ record duration │ │
┌────────┐ ◀───┘
│ ◀──┤200 OK ├── │ │ │
└────────┘ ┌────────────────────────────────┐
│ │ ──┤ asynchronously report span ├────▶ │
│ │
│{ │
│ "traceId": "aa", │
│ "id": "6b", │
│ "name": "get", │
│ "timestamp": 1483945573944000,│
│ "duration": 386000, │
│ "annotations": [ │
│--snip-- │
└────────────────────────────────┘
Trace instrumentation report spans asynchronously to prevent delays or failures relating to the tracing system from delaying or breaking user code.
In Spring-Cloud, the RESTful API actually play the key role in the ecosystem. Why/How do we document our RESTful API?
The most important design principle for RESTful Services is Think about consumer of the API
, Not like SOAP-Based Web service, REST does not provide a spec or a contract like WSDL
in SOAP. Indeed, REST gives you the flexibility to choose proper documentation format and approach. But that does not mean “No documentation”. We do need to document our API.
-
Poor man's option is to maintain documentation manually. But that gets you to much boring work for the doc to sync up with the latest code, and also versioning work.
-
An Other option is to generate documentation from code. Spring-Cloud provides an easy way to do this. The PoC
There are multiple approaches to documenting your RESTful API
- [WADL](https://javaee.github.io/wadl/)
- RESTDocs
- Swagger or OpenDocs
Swagger appears in the last couple of years and is now the most popular REST API documentation standard. We will use it in our sample for APIDoc generation
To generate swagger document, springfox currently provides a popular solution in spring-boot. We just need to introduce the dependence to springfox-swagger2
, configure the Docket, using the appropriate annotations on the API router, method and models, then the swagger document will be generated automatically.
Access it via:
http://<hostname>:<port>/v2/api-docs
If we want a visual UI and execute the API with test request, just add a dependence to springfox-swagger-ui
. and open below link in browser:
http://<hostname>:<port>/swagger-ui.html
//TODO
Use nmon
to collect the basic system profiling data. When do the benchmark testing, run the nmon
command in background for overall system profiling, below is an example of visualized performance analysis result(charts in spreadsheet) translated from nmon
profiling data: hello-springcloud_180918_0439.nmon.xlsx
-
Have perf_event tool installed on your linux environment
# On Ubuntu sudo apt-get install -y linux-tools-common
The sample was verified on ubuntu with kernel below:
Linux lizh-laptop 4.4.0-72-generic #93~14.04.1-Ubuntu SMP Fri Mar 31 15:05:15 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
-
Using JDKv8(>=v8u60) or JDKv9 to support
-XX:+PreserveFramePointer
$ java -version java version "1.8.0_60" Java(TM) SE Runtime Environment (build 1.8.0_60-b27) Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
-
Set JAVA_HOME properly
export JAVA_HOME=<YOUR_JDK_HOME>
-
Install perf-map-agent
perf-map-agent
is the instrument tool to translate theJava Symbols
forperf_event
, that means perf_event will read the map to unerstand theVMA
(Virutal Memory Address) to corresponding Java symbols, and generate a human readble reportsudo apt-get install -y cmake git clone https://github.com/jrudolph/perf-map-agent cd perf-map-agent cmake . make
attach-main.jar
andlibperfmap.so
get compiled in the out folder -
Create command links in
<HOME>/bin
cd <perf-map-agent_HOME>/bin ./create-links-in <HOME>/bin
A soft link for
perf-java-flames
command will be created in<HOME>/bin
, so we can use this command anywhere without the need of full path specified. -
Install FlameGraph
git clone https://github.com/brendangregg/FlameGraph export FLAMEGRAPH_DIR=<FlameGraph_HOME>
- Run Java process with
-XX:+PreserveFramePointer
, e.g:cd <modules>/calculator.ui java -XX:+PreserveFramePointer -jar ./out/libs/calculator.ui-1.0.0.jar
- Check it is enabled(on Linux)
ps wwp `pgrep -n java` | grep PreserveFramePointer
- Profiling and generate flamegraph for the Java process
perf-java-flames <pid>
- Sample result
-
Oracle JVM Tools:
jmap
and ibm-heap-analyzer -
IBM JVM
To enable signal-based Java Heap dumps, the IBM_HEAPDUMP=TRUE
environmental variable or the appropriate JAVA_DUMP_OPTS
must be set. e.g:
# Manually trigger a heap dump by sending a SIGQUIT signal to JVM process
IBM_HEAPDUMP=TRUE ~/opt/ibm-jdk-1.8.0/bin/java -jar ./add.svc-1.0.0.jar # the pid is 21046
kill -3 21046
The heapdump file location will be printed in the java process stdout
JVMDUMP032I JVM requested Heap dump using 'heapdump.20180918.195336.21046.0001.phd' in response to an event
JVMDUMP010I Heap dump written to heapdump.20180918.195336.21046.0001.phd
- Oracle JVM, we commonly have 3 methods for the thread(core) dump on Linux:
- Method-1
# make you have gdb installed, which provides gcore command to do the core dump.
gcore <java pid> # e.g: pid: 7737
jstack -m -l `which java` ./core.7737 > java-7737.core
Result: java-7737.core
Note:
Before use gcore generate core dump, please run the
ulimit -c
command to check or set the core file size limit. Make sure that the limit is set tounlimited
; otherwise the core file could be truncated
jstack <executable> <core>
also usually be used in postmortem situation
- Method-2 You can use jstack to attach to the running java program for the thread dump
jstack -l -m -F <pid>
- Method-3
kill -3 <pid> # print the dump info in the java process stdout
- For IBM JVM, the core dump file contains much more contents, and we can use IBM Thread and Monitor Dump Analyzer for the analysis
# Manually trigger a javacore dump by sending a SIGQUIT signal to JVM process
kill -3 <pid>
JVM log will point you to the location of javacore dump
JVMDUMP039I Processing dump event "user", detail "" at 2018/09/18 18:24:03 - please wait.
JVMDUMP032I JVM requested Java dump using 'javacore.20180918.182403.10902.0001.txt' in response to an event
JVMDUMP010I Java dump written to javacore.20180918.182403.10902.0001.txt
JVMDUMP013I Processed dump event "user", detail "".
ops
├── Vagrantfile // Vagrant file
├── ansible // ansible scripts for install and start services, including: commoent runtime dependences, zookeeper, kafka, nginx(for local pkgs repo), install JVM, filebeat, consul, nomad, elasticsearch, logstash, kibana and wrk
├── bin // script, including boostrap.sh, click.sh(fire an invocation on the sample), kafka-*-monitor.sh, start_all_jobs.sh and stop_all_jobs.sh
├── deployable // nomad job definition files(hcl) for microservices
├── deps // binary dependences, which cache it locally to reduce(avoid) network deps during demonstration
├── dist // pkgs publish folder, nginx is started on this folder to simulate a pkg repository
└── log-shipper // filebeat conf for log-shipper side-task
To keep the sample simple, I didn't introduce a real CI dependence(e.g: usually gitlab, jenkins, etc) here. Just create a gradle deploy
task to pack the deployables to a folder named dist
. Later, when the VM provisioned, the nginx
will start up and provide a http pkgs server on that folder.
cd <hello_springcloud_home>
gradle clean build
gradle deploy
The sample is using [vagrant](https://www.vagrantup.com/docs/installation/)
to manage the VM. Make sure you have vagrant
installed on your host.
vagrant up
All of runtime dependences, including common dependences, elasticsearch, filebeat, hashi-ui, JAVA-v8, kafka, kibana, logstash, nginx, nomad, wrk and zookeeper are managed via ansible, which will be initialized during VM provision.
vagrant provision
vagrant ssh
cd /vagrant/bin
./start_all_jobs.sh
Open the browser on your host with url "http://10.10.10.200:3000"
Open the browser on your host with url "http://10.10.10.200:9080"
vagrant ssh
cd /vagrant/bin
./click.sh
If the sample run successfully, you will see output in stdout as below:
{"result":9}
In this sample, only apigateway's distributed tracing is reported to kafka. Other services will report trace via HTTP protocol
vagrant ssh
cd /vagrant/bin
./kafka-zipkin-topic-monitor.sh
run the sample, you will see the message like below:
[{"traceId":"ebe29e701126f8a2","parentId":"ebe29e701126f8a2","id":"499be36840d38ffb","name":"post","timestamp":1536129010924681,"duration":32367,"localEndpoint":{"serviceName":"api-gateway","ipv4":"10.0.2.15"},"tags":{"component":"zuul","http.status_code":"200"}},{"traceId":"ebe29e701126f8a2","id":"ebe29e701126f8a2","kind":"SERVER","name":"post","timestamp":1536129010924004,"duration":34677,"localEndpoint":{"serviceName":"api-gateway","ipv4":"10.0.2.15"},"tags":{"component":"java-web-servlet","http.method":"POST","http.status_code":"200","http.url":"http://10.10.10.200:2809/calculator-ui/api/v1/compute"}}]
In this sample, only apigateway's aggregated logging is shipped to kafka. Other services will shipped to filebeat's stdout.
vagrant ssh
cd /vagrant/bin
./kafka-filebeat-topic-monitor.sh
run the sample, you will see the message like below:
{"@timestamp":"2018-09-05T06:34:20.283Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.4.0","topic":"api-gateway-filebeat"},"input":{"type":"log"},"beat":{"version":"6.4.0","name":"hello-springcloud","hostname":"hello-springcloud"},"host":{"name":"hello-springcloud"},"source":"/data/nomad/data/alloc/c24d5f7a-c476-9aec-da51-6274a8e990e5/alloc/logs/api-gateway-task.stdout.0","offset":916841,"message":"[DEBUG]: [Enter]: [POST] -\u003e http://10.10.10.200:2809/calculator-ui/api/v1/compute","prospector":{"type":"log"}}
{"@timestamp":"2018-09-05T06:34:20.283Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.4.0","topic":"api-gateway-filebeat"},"message":"[DEBUG]: \u003e\u003e\u003e Content-Type: application/json","source":"/data/nomad/data/alloc/c24d5f7a-c476-9aec-da51-6274a8e990e5/alloc/logs/api-gateway-task.stdout.0","offset":916923,"prospector":{"type":"log"},"input":{"type":"log"},"beat":{"name":"hello-springcloud","hostname":"hello-springcloud","version":"6.4.0"},"host":{"name":"hello-springcloud"}}
{"@timestamp":"2018-09-05T06:34:20.283Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.4.0","topic":"api-gateway-filebeat"},"source":"/data/nomad/data/alloc/c24d5f7a-c476-9aec-da51-6274a8e990e5/alloc/logs/api-gateway-task.stdout.0","prospector":{"type":"log"},"input":{"type":"log"},"beat":{"name":"hello-springcloud","hostname":"hello-springcloud","version":"6.4.0"},"host":{"name":"hello-springcloud"},"offset":916967,"message":"[DEBUG]: [Exit]: [POST] [200] \u003c- http://10.10.10.200:2809/calculator-ui/api/v1/compute"}
vagrant ssh
cd /vagrant/bin
./benchmark.sh
Output:
Running 1m test @ http://localhost:2809/calculator-ui/api/v1/compute
5 threads and 5 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 123.07ms 158.81ms 1.88s 95.71%
Req/Sec 11.32 6.36 40.00 56.93%
Latency Distribution
50% 87.95ms
75% 138.00ms
90% 205.89ms
99% 1.01s
2985 requests in 1.00m, 437.76KB read
Requests/sec: 49.72
The nmon
command running in background to do the sampling, the overall system performance analysis result is hello-springcloud_180918_0439.nmon.xlsx
Open browser on host machine with url http://10.10.10.200:9411
Open browser on host machine with url http://10.10.10.200:5601
vagrant ssh
cd /vagrant/bin
./stop_all_jobs.sh