Skip to content

quipo/workerpoolmanager

Repository files navigation

Task / worker pool manager in Go

Build Status GoDoc

  • Start cli tasks automatically
  • Maintain the desired number of worker processes for each task
  • Handle automatic restarts when a worker dies or stalls

The task manager will be able to start any cli (shell) script from the chosen directory. For tasks that are long-running and meant to be monitored continuously, each worker process should regular keep-alive messages via a ZeroMQ PUB-SUB channel to communicate its health, and should handle SIGTERM messages when asked to terminate. If the worker doesn't respond to a SIGTERM signal, it will be killed with SIGKILL after a (configurable) grace period. The number of workers stalled/stopped since the task manager was started is reported in the task status.

The main package is the Task Manager (wpmanager), which can load the configuration, start some tasks automatically, and handle signals (CTRL+C) and HTTP requests to control the state of the tasks.

There's also a sample console application (wpconsole) to control the status and the configuration of each task from the command line. It supports controlling task managers running on different hosts/ports, and has a tab-completion interface (for both commands and task names).

Usage

Installation and configuration

go install github.com/quipo/workerpoolmanager/wpmanager
go install github.com/quipo/workerpoolmanager/wpconsole

Prepare a configuration file along the lines of the provided example

Start the manager by pointing it to the configuration file:

./wpmanager -conf=tasks.json

Terminate the manager with CTRL+C: the manager will ask the tasks (and all their workers) to stop gracefully before shutting down.

Control the status of the tasks using the console:

./wpconsole -host=localhost:8010

> ls
Task1
Task2
Task3

> set Task1 cardinality 10
Changed workers cardinality

> stop Task3
Stopped Task3 workers

> start Task3
Started task manager 

> listworkers Task3
Task: Task3,	Pid: 5991,	Started: 2014-01-23T18:05:55Z,	Last alive at: 2014-01-23T18:07:37Z
Task: Task3,	Pid: 6179,	Started: 2014-01-23T18:06:03Z,	Last alive at: 2014-01-23T18:07:41Z

> stop
Stopped Task3 workers
Stopped Task1 workers
Stopped Task2 workers

> status
No active tasks

> quit

or using the HTTP interface:

$ curl http://localhost:8010/tasks

No active tasks

$ curl -X POST http://localhost:8010/tasks/Task1/start

Started task manager

$ curl http://localhost:8010/tasks/Task1

Task name:      Task1
Running:        true
Started at:     2014-01-21T09:56:45Z
Last alive at:  2014-01-21T09:57:53Z
Active Workers: 1 / 1
Dead Workers:   0

$ curl -X POST http://localhost:8010/tasks/Task1/set/cardinality/10

Changed workers cardinality

$ curl http://localhost:8010/tasks/Task1

Task name:	    Task1
Running:	    true
Started at:	    2014-01-21T09:56:45Z
Last alive at:  2014-01-21T09:57:53Z
Active Workers: 10 / 10
Dead Workers:   1

$ curl -X DELETE http://localhost:8010/tasks/Task1

Stopped Task1 workers


$ curl -X DELETE http://localhost:8010/tasks

Stopped Task3 workers
Stopped Task2 workers

Programmatic usage

Make sure you have the zeromq-devel and readline-devel packages installed.

Load the packages:

# install dependencies
go get github.com/codegangsta/martini
go get github.com/bobappleyard/readline
go get github.com/pebbe/zmq4
go get github.com/quipo/goprofiler/profiler

# install packages
go get github.com/quipo/workerpoolmanager
go get github.com/quipo/workerpoolmanager/taskmanager
go get github.com/quipo/workerpoolmanager/utils

Init a task runner:

package main

import (
	"github.com/quipo/workerpoolmanager/taskmanager"
	"github.com/quipo/workerpoolmanager/utils"
)

func main() {
	//...
}

API Documentation

View the GoDoc generated documentation here.

Worker examples

The resources/examples/tasks/ folder contains some worker examples in a few languages. Note the signal handler (to catch SIGTERM signals and terminate gracefully) and the keep-alive messages (sent by the workers to the manager).

TODO

  • Better logging (implement INFO/WARNING/ERROR levels)
  • Use concurrent map for workers?
  • Test suite
  • Command.Type => constants instead of string literals
  • Send "stopping" message on keep-alive channel on worker exit
  • Alternate machine-readable (JSON) output for responses from the HTTP interface (via Accept headers)
  • Break out dead workers number into stalls, successful exits and bad exits
  • Add metrics - probably to Riemann
  • Ability to start / stop all the autostart tasks

Design

  1. Task Runner

    • holds a list of all the available tasks
    • keeps an open channel to communicate with each Task Manager
    • keeps a command channel to communicate with the Signal and HTTP handlers
  2. Task Manager

    • holds a list of all the running workers
    • keeps an open channel to communicate with the Task Runner
    • keeps an open channel to send commands to each worker
    • keeps a feedback channel to get messages from all workers
  3. Worker

    • waits on the process to capture when it exits/dies
    • has a ticker to regularly check if the process stalled (i.e. running but not sending keep-alives)
    • keeps an open channel to receive commands from the parent Task Manager
    • keeps a keep-alive channel to receive keep-alives from the parent Task Manager
  4. Signal Handler

    • Detects CTRL+C signals and asks the Task Runner to terminate
  5. Keep-alive Handler

    • Listens for worker keep-alive messages on a ZMQ PubSub channel
    • Asks the Task Manager to update the workers' last-seen-alive datetime
  6. HTTP Handler

    • Listens for requests to list, start and stop Task Managers, or change the cardinality of their workers
  7. Console App / HTTP client

    • Tools/Libraries to communicate with the HTTP Handler, to get the status and control the Task Managers.
                                Cmd channel
                               ╔══════════════════════════════════════════════════════════
                               ║                            ^                      ^
                               ║                            |                      |
                               ║                  +--------------------+           |
+--------+                     ║              ____| Keep-alive Handler |           |
|        |            +----------------+     /    +--------------------+           |
|        |      +---- | Task Manager 1 |-----          ^   zeromq   ^              |
|        |      |     |----------------|     \         |   PubSub   |              |
|        |------+---- |      ...       |      \    +----------+----------+-----+----------+
|        |      |     |----------------|       +---| Worker 1 | Worker 2 | ... | Worker N |
|        |      +---- | Task Manager N |           +----------+----------+-----+----------+
|        |            +----------------+              |    \                
|        |                    |                       |     \               
|        |                    |                 +----------+ \ +---------+
|        |                    |                 | Stalled  |  \| syswait |
|        |                    |                 | Worker   |   |   on    |
|        |                    |                 | Detector |   | process |
|        |                    |                 +----------+   +---------+
|        |                    |
|        |                    |
|        |                    |
|  Task  |  Cmd Channel       v
| Runner |════════════════════════════════════════════
|        |                    ^             ^
|        |                    |             |
|        |_____               |             |
|        |     \              |             |
|        |      \     +----------------+    |
|        |       ¯¯¯¯¯| Signal Handler |    |
|        |            +----------------+    |
|        |_____                             |
|        |     \                            |
|        |      \                   +----------------+
|        |       ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯|  HTTP Handler  |
|        |                          +----------------+
+--------+                                 ^
                                           ║
                            ______________/ \_________________
                           /                                  \
                  +----------------+                  +----------------+
                  |   Console App  |                  |   HTTP Client  |
                  +----------------+                  +----------------+

Contribute

Contributions are welcome. Please open pull requests or issue reports!

Author

Lorenzo Alberton

License

This repository is Copyright (c) 2014-2015 Lorenzo Alberton, All rights reserved. It is licensed under the MIT license. Please see the LICENSE file for applicable license terms.