How to store stuff and make sense of the ELK statck : Amazing doc from Wilsonmar

This repo contains notes, configuration, and source files on creating a way to analyze with only free/open source software. The components include the "ELK" stack, where stands for Elasticsearch Logstash Kibana:

  1. Github for documentation and source control.

  2. Linux Ubuntu (Amazon build) running on all servers for DEB packages installed by command dpkg

  3. STunnel to secure communications

  4. Docker to install packages on servers.

  5. Logstash Forwarder on all servers to direct log entry flow to a collector.

  6. JMeter to create load on the system artificially by using Java programs to emulate many real clients.

  7. Maven to package java

  8. Puppet to manage configurations

  9. NGINX to distribute among servers

  10. Logstash collects timestamped logs of various formats, from various sources, parse to filter out junk, index them, and normalize into JSON in a way that's searchable in a central location. Better than awk, grep, etc. on individual machines.

  11. RabbitMQ queue services between Logstash producers and consumers to ensure scalability by absorbing spikes.

  12. Elasticsearch indexes (inverted) nested aggregations of data in Hadoop.

  13. Curator at to manage Elasticsearch indexes by enabling admins to schedule operations to optimise, close, and delete indexes.

  14. Kibana does data discovery on elasticsearch cluster to identify "actionable insights" and presents visualization (a dashboard).

Elastic also offers cloud services and related paid (licensed) software to manage and protect the ELK stack:

  1. Found
  2. Shield to Secure data in Elasticsearch.
  3. Marvel to Monitor Elasticsearch deployments.
  4. Watcher Alerting for Elasticsearch.
  5. Packetbeat to Analyze network packet data.

Instead of piping individual logs such as:

$ log_producer | grep ... | sed ... | awk ... | tee output \ | sort | uniq -c | sort -n

Elasticsearch provides consistency to different time stamp formats.

Kibana "democratizes" data by putting a front-end to access data in a searcheable in fast a meaningful ways. The Definitive Guide to Elastisearch you can submit updates at

There is a lighter edition of Logstash.

Kibana & Elasticsearch started as an open source project, built by devops people for devops people.

It's priced by node to be managed and monitor at scale (less than Splunk and doesn't run out of gas). There's no separate enterprise edition.

Marvel is free until production. Unlike Splunk, where it's expensive (millions) after the first 500 MB of free.

Competitors to Logstash include

  • D3 JS library flexibility

  • Watcher -

  • Shield support for security

  • Bulk operations (for indexing and search operations)

  • Percolator ("reversed search" - alerts, classification)

  • Suggesters ("Did you mean ...?")

  • Index aliases (Grouping, filtering or "renaming" of indices every day)

  • Index templates (automatic index configuration)

  • Monitoring API (amount of memory used, number of operations, etc.)

  • Pie charts have nested levels

  1. Identify the set of versions of various components from their web page
URL to Website Version Length 1.5.3 88M Forwarder Mac 1.5.3 5.8M 1.7.1 27M 3.1.3 1.0M

Clicking Download in these web pages will download to your default Downloads folder.

NOTE: Kibana version 4 is a major upgrade over version 3.

  1. PROTIP: Create a folder such as ELK_installers_20150801 so the same set of versions tested together travel together.

  2. If you rather download and expand using a script as shown below, revise the version number to the latest ones.

mkdir ELK_Installers_20150801 cd ELK_Installers_20150801

wget wget wget wget

Instead of `wget`, one can use `curl -O `.

4. In production mode, each component is usually installed to a separate machine.
So a different download installation script is used for each machine.
But for experimentation on a Macbook, all are installed.

5. Decide on the folder where the components are expanded to.

It's better if components are referenced in a folder without a version code.

mkdir /usr/local/logstash tar zxvf logstash-1.5.3.tar.gz -C mkdir /usr/local/logstash

mkdir /usr/local/elasticsearch tar zxvf elasticsearch-1.7.1.tar.gz -C /usr/local/elasticsearch

mkdir /usr/local/kibana tar zxvf kibana-3.1.3.tar.gz -C /usr/local/kibana

Unix machines install packages under the <strong>/opt</strong> directory.
But Macs don't have that by default.

Other folders in <strong>/usr/local</strong> include, Cellar, Library, opt, lib, bin, sbin, man.
So a better location may be <strong>/usr/local/opt</strong>?

When using a basic OS X Server, it may be:


 cp -R /usr/local/kibana/kibana-3.0.0/* 

A machine must have at least 85% disk space free to avoid <strong>low disk watermark</strong> errors
limiting Logstash operation.

5. Once expanded, archive the installer folder and delete the tar.gz files.
 suggests keeping older binaries in case they get revved out and a script against them.

6. Follow the <strong>Configuration Steps</strong>.

## <a name="LogstashConfig"> Logstash Configuration</a>
1. Create a configuration file using the sublime editor:

subl logstash.conf

2. Copy the following and paste into the .conf editor window:

input { 
    stdin { } 
filter {
     grok {
             type => "apache"
             pattern ==> ['%{COMBINEDAPACHELOG}']
output {
     stdout { codec => rubydebug }
     elasticsearch { embedded => true }

A basic Logstash configuration file contains 3 blocks: input, filter, and output.

Each block contains a plugins distributed as RubyGems to ease packaging and distribution.

Filters are applied in the order they are specified in the .conf file.

Field names are specified between %{ and }.

  1. Associate .config files with a text editor.

  2. If using the vi editor, press Esc, then write and quit the vi editor by typing :wq.

A sample:

#install logstash (based on
sudo wget
sudo mkdir /opt/logstash
sudo mv logstash-1.3.2-flatjar.jar /opt/logstash/logstash.jar
sudo wget
sudo wget
sudo mv hello.conf /opt/logstash/hello.conf
sudo mv hello-search.conf /opt/logstash/hello-search.conf
cd /opt/logstash/
#example configuration
java -jar logstash.jar agent -f hello.conf
java -jar logstash.jar agent -f hello-search.conf

The java here is a JRuby run-time (for performance). Logstash is extendable with Ruby.

  1. Run Logstash using a script in the bin folder and the .conf file just created:

    bin/logstash agent --debug -f logstash.conf

    See list of command line flags. If the command includes --configtest or just -t, logstash stops after processing it.

    If a folder is specified, such as /etc/logstash/conf.d, all .conf files in it are loaded.

    To stop on a Mac, hold down control and press C. On Windows, it's Ctrl+C.

    In production mode, Logstash would be started as a service (Unix daemon):

    sudo service logstash start

    Logstash sends its own log output to /var/log/logstash/logstash.log by default.

Logs into Logstash brokers can be from various shippers (origins):

  • Files
  • Syslog
  • Microsoft Windows Eventlogs
  • WebSockets
  • ZeroMQ
  • Twitter
  • SNMPTrap
  • geoIP

Brokers go to Lucene index accessed by the storage and search server which has a web interface.

The lifecycle of a log: Record, Transmit, Store, Delete.

  1. Since STDIN means the command line, type testing and press Enter for this debug response:

        "message" => "testing",
       "@version" => "1",
     "@timestamp" => "2015-08-02T02:02:06.903Z",
           "host" => "Wilsons-MacBook-Pro.local"


The Z in the timestamp stands for GMT/UTC "Zulu" time, basically London time without the 
Summer Time (what the UK calls Daylight Savings Time in the US).

## <a name="LogFormats"> Log Input Formats</a>,
Data Formats:

* Multi-line stack traces
* Regex
* Grok (Regex on steroids)
* Zabbix
* SQS (Amazon)

Logstash normalizes different timestamps into your format.

### <a name="LogOutputs"> Logstash Outputs</a>
With the categories of output:

* Redis
* RabbitMQ
* TCP/UDP socket
* Kafka
* Syslog

* Elasticsearch
* MongoDB
* Amazon S3
* File

* PagerDuty
* Nagios monitoring
* Email
* Amazon Cloudwatch
* Alerting tools (Hipchat, SMS)

Metrics (graphics):
* StatsD
* Graphite
* Ganglia

### <a name="Brokers"> Brokers</a>

* AMQP (Advanced Message Queuing Protocol)
* zMQ at
* Redis from receives the log event on the central server and acts as a buffer (port 6379),
which should be used only with STunnel or with public information.

The front server would notice files based on this .conf using just a few of the
<a target="_blank" href="">
file plugin's many options</a>.

input { file { type = > "syslog" path = > ["/var/log/secure", "/var/log/messages"] exclude = > ["*. gz"] } } } output { stdout { } redis { host = > "" data_type = > "list" key = > "logstash" } }

The backend:

input { redis { host = > "" type = > "redis-input" data_type = > "list" key = > "logstash" } output { stdout { } elasticsearch { cluster = > "logstash" } }

### <a name="LogstashFilters"> Logstash Filters</a>
labls instead of regex patterns.

* <strong>grok</strong> uses patterns to extract data into fields.
* <strong>date</strong> parses timestamps from fields to standardize into a "canonical" date format
* <strong>mutate</strong> rename, remove, replace, modify fields in events
* <strong>geoip</strong> determines geographic info. from IP addresses (via Maxmind)
* <strong>csv</strong> parses comma separated values or other pattern or string
* <strong>kv</strong> key-value pairs in event data
* grep
* alter
* multiline
* <strong>ruby</strong> to run arbitrary Ruby-language code.

## <a name="Puppet"> Puppet Modules</a>

## <a name="LogstashForwarder"> Logstash Forwarder on Shippers</a>
Configure for scale by using a Logstash Forwarder and RabbitMQ between a Logstash Producer and Logstash Consumer

Logstash Forwarder is written in the programming language Go.

<a target="_blank" href="">
VIDEO: Logstash</a>

## <a name="ElasticConfig"> Elasticsearch Configure</a>
On a Mac with Homebrew installed:

brew install elasticsearch nginx

Configure Elasticsearch is described at

To enable Elasticsearch go in the bin folder and run file elasticsearch.

Indexes are stored in two types of shards (Apache Lucene instances): primary and replica.
Primary shards are where documents are stored. 
Five primary shards are created for each new index by default.
This default can change but not AFTER it is created.

Each primary shard has one replica by default but that can be changed dynamically for scale out or to make an index more resilient. 

Elasticsearch cleverly distributes shards across available nodes such that primary and replica shards for an index are not present on the same <strong>node</strong> that is automatically part of an Elasticsearch cluster.

Elasticsearch moves shards automatically from one node to another in the case of node failure or when new nodes are added.

line 32 - read the comments on why you might not want localhost here

for dev box only

elasticsearch: "http://localhost:9200",

enable cors for kibana3 + elasticsearch 1.4

vi /usr/local/Cellar/elasticsearch/1.4.3/config/elasticsearch.yml

kibana 3 compatibility

http.cors.enabled: true http.cors.allow-origin: http://localhost:8080

the services command is from the brew/tap at the top, love it

$ brew services restart elasticsearch

make sure nginx starts by itself

nginx config is in /usr/local/etc/nginx/nginx.conf if you need to look at it

it won't need any edits for kibana. it's just js/html in a directory.

browse to http://localhost:8080/kibana (you should see a kibana page)

Now, let's change the default page to logstash.

cd /usr/local/var/www/kibana/app/dashboards mv default.json default.json.orig cp logstash.json default.json

refresh the kibana page. It will be logstash's default now.

## <a name="KibanaConfig"> Kibana Configure</a>
Kibana installs with its own Node.js server. It doesn't use a web server.

A default <strong>config.js</strong> comes with the installer.

A single node is a master, data, and client nodes.
A node specializes into data and client nodes.

### <a name="Docker"> Docker package</a>

## <a name="Demo"> Demo</a>
used Virtualbox and Vagrantup.

## <a name="Kibana"> Kibana Dashboard</a>
Kibana replaces the Logstash Web UI.
It is built on Ruby with Sinatra framework.

For more scale, between intermediate brokers are
* Storm
* Spark cache
* Samza

Flume can send to HDFS for es-hadoop

## <a name="Watcher"> Watcher</a>

Sends notifications via PagerDuty

## <a name="RockStars"> ELK Rock Stars</a>
This tutorial was based information from these people and their work:

* at Elasticsearch
Mar 17, 2015

* Andrew Puch @gmail
who runs
and the #devopsengineers Slack channel at
created as
which details how to setup Elasticsearch.

* Steve Mayzak-Director of Sales Engineering @ ElasticSearch
What is ELK and how can it help you discover, visualize and analyze your data?
Oct. 14, 2014

* Tim and Anna Roes in Germany:

* Jeff Sogolov:
Visualizing Logs Using ElasticSearch, Logstash and Kibana
May 16, 2014

* Agitare Technologies
What is ELK and how can it help you discover, visualize and analyze your data?

* Spencer Alger (@spencerAlger) 
His intro to the ELK Stack Feb 25, 2015
He works on JS libraries at Elastic

* James Turnbull (at Kickstarter)
wrote $9.99 The Logstash v1.5 Book Kindle Edition

*  Alberto Paro wrote
ElasticSearch Cookbook $20.44

* Clinton Gormley and Zachary Tong
Elasticsearch: The Definitive Guide Feb 7, 2015 $22.55

* Radu Gheorghe (Author), Matthew Lee Hinman  (Author), & 1 more
Elasticsearch in Action Paperback – August 31, 2015

## <a name="Social"> Social</a>
A collection of Kibana dashboards from the community


