Motivation

A simple infrastructure debug orchestrator to bring your thoughts into discrete steps and tie them all together to effectively find or resolve issues.

Motivation

We all have nuances in how we debug issues and somehow we tend to think it's an art. The aim with cortex is to bring some science and automation into how we debug infrastructure problems. The main task cortex tries to solve is bring some structure to the art and provide an easier way to mimic your thought process so that it's easier to share with others and not having to ponder "What did I do 2 weeks ago". The hope with this tool is that it would help the SRE to think in discrete steps that could then be collated, reused and executed in different ways which would help not just in expressing the tasks better, but also become primers for the juniors in the team to learn the different ways to debug.

We understand there are myriad sets of tools already out there and our aim is not build something for posterity, but to use something that was lightweight and easy to use till a better tool arrives. Even with tools like Chef and Ansible being used, a vast majority of debug steps are still just plain shell scripts and our aim was to harness that knowledge and use it a fashion that could bring some order to it.

How does cortex work?

Cortex works on the following principles:

Every action we take can be a discretely identified task through and is called a neuron
Every action has an output that could end the debugging session or be passed to the next neuron, or be discarded
You could create a plan composing of multiple neurons to be run in parallel or sequence and is called a synapse
Each neuron could be fired independently by multiple synapse and needs to handle state accordingly
Synapse should make the determination of resolution based on analysis of outputs from the neurons

The architecture is quite simple, and think of synapse building directed acyclic graph of parallel vertices and sequential plans which are then executed in an event loop.

where p_neurons are executed in parallel and s_neurons in serial

Creating neurons

Neurons are folders that contain a run script that can exit with a defined exit code and also contain a configuration yaml named neuron.yaml. A few conventions that would be good to follow:

If the neuron script does not mutate anything, by convention, start with "check_" as the prefix. For eg: "check_web_proxy_connection_config".
If its a mutating neuron i.e it updates a config or property, use "mutate_" as the prefix. For eg: "mutate_web_proxy_connection_config"

This helps the reader to make quicker decisions on running a harmless synapse vs ones they need to be careful about running in say a production environment. The name of the folder should give enough indication on the activity.

To create a neuron, run cortex create-neuron check_web_proxy_conn_config for more options, run cortex create-neuron -h

It would create a folder and bootstrap files as below:

check_web_proxy_conn_config
    |----- neuron.yaml
    |----- run.sh
    |----- run.ps1

sample neuron.yaml:

---
name: check_web_proxy_conn_config
type: check
description: "A longer description"
exec_file: %s
pre_exec_debug: "Going to check the web_proxy connection configuration"
assertExitStatus: [0, 137]
post_exec_success_debug: "All configurations checkout ok"
post_exec_fail_debug:
  120: "Found maxconn rate to be too low"
  110: "Found maxpipes to be too low"`

As you notice, the exit code has a lot of importance in how your neurons propogate the debugging to the next step within the synapse.

Creating synapse

To create a synapse, run: cortex create-synapse app_network_latency This would create a folder with the following bootstrap files:

app_network_latency
     |------ synapse.yaml

For more options, run: cortex create-synapse -h

To add a neuron to the synapse to be planned in sequence, run:

cortex add-neuron --synapse app_network_latency --neuron /usr/neurons/check_web_proxy_conn_config --sequence

and to add the same to be run in parallel:

cortex add-neuron --synapse app_network_latency --neuron /usr/neurons/check_web_proxy_conn_config --parallel

For more options, run: cortex add-neuron -h

A sample synapse yaml when you want to fix something on occurrence of an error:

---
name: app_network_latency
definition:
  - neuron: check_web_proxy_conn_config
    config:
      path: /usr/neurons/check_web_proxy_conn_config
      fix:
        - 120: mutate_web_proxy_conn_bump_maxconn_config
        - 110: mutate_web_proxy_conn_bump_maxpipes_config
  - neuron: check_api_gateway_conn_config
    config:
      path: /usr/neurons/check_web_proxy_conn_config
      fix:
        - 120: mutate_api_gateway_conn_bump_maxconn_config
        - 110: mutate_api_gateway_conn_bump_maxpipes_config
  - neuron: mutate_web_proxy_conn_bump_maxconn_config
    config:
    path: /usr/neurons/mutate_web_proxy_conn_bump_maxconn_config
 plan:
  config:
    - exit_on_first_error: false    
  steps:
    serial
      - check_api_gateway_conn_config
      - check_web_proxy_cpu_usage
      - check_grafana_cpu_trend

A synapse that only checks and does not mutate:

---
name: app_network_latency
definition:
  - neuron: check_web_proxy_conn_config
    config:
      path: /usr/neurons/check_web_proxy_conn_config
  - neuron: check_api_gateway_conn_config
    config:
      path: /usr/neurons/check_web_proxy_conn_config
 plan:
    config:
      - exit_on_first_error: false
    steps:
      serial
        - check_web_proxy_conn_config
        - check_api_gateway_conn_config

A synapse that that runs the checks in parallel:

---
name: app_network_latency
 - definition:
     - neuron: check_web_proxy_conn_config
       config:
         path: /usr/neurons/check_web_proxy_conn_config
         fix:
          - 120: mutate_web_proxy_conn_bump_maxconn_config
          - 110: mutate_web_proxy_conn_bump_maxpipes_config
     - neuron: check_api_gateway_conn_config
       config:
         path: /usr/neurons/check_web_proxy_conn_config
         fix:
          - 120: mutate_api_gateway_conn_bump_maxconn_config
          - 110: mutate_api_gateway_conn_bump_maxpipes_config
    - neuron: mutate_web_proxy_conn_bump_maxconn_config
      config:
 plan:
   - config:
     - exit_on_first_error: false
   - parallel
     - check_web_proxy_conn_config
     - check_api_gateway_conn_config
     - check_web_proxy_cpu_usage
     - check_grafana_cpu_trend

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
acceptance		acceptance
assets		assets
cmd		cmd
internal		internal
logger		logger
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Motivation

How does cortex work?

Creating neurons

Creating synapse

About

Releases

Packages

Languages

License

guidewire-oss/cortex

Folders and files

Latest commit

History

Repository files navigation

Motivation

How does cortex work?

Creating neurons

Creating synapse

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages