Skip to content

Latest commit

 

History

History
139 lines (107 loc) · 4.83 KB

File metadata and controls

139 lines (107 loc) · 4.83 KB

Stats

The stats metrics aggregation is a simple aggregation that computes a few statistics over numeric values. These values can be either extracted from numeric fields or generated by script. See the Elasticsearch documentation on stats metric aggregation.

The following stats are returned

  • min - minimum value
  • max - maximum value
  • sum - sum of all values
  • count - number of extracted values
  • avg - average mean value

Index Mapping

Let's build a new index with a mapping that could be used to store logs from a web service.

✅ Create a new index named stats_aggs with the following mapping.

curl -X PUT 'http://localhost:9200/stats_aggs' -H 'Content-Type: application/json' -d '{
  "mappings": {
    "properties": {
      "response_time_in_ms": {
        "type": "integer"
      },
      "status_code": {
        "type": "keyword"
      },
      "url": {
        "type": "text"
      }
    }
  }
}'

For the purpose of the examples let's assume this index mapping stores status_code, url and response_time_in_ms.

The mapping contains the following fields:

  • url using the field type text
  • status_code using field type keyword
  • response_time_in_ms using field type integer

Add Documents

Let's add a few documents that represent web service responses.

✅ Bulk upload documents to index stats_aggs

curl -H 'Content-Type: application/x-ndjson' -X POST 'http://localhost:9200/stats_aggs/_bulk' -d '
{"index":{"_index":"stats_aggs"}}
{"url": "http://example.com/1", "status_code": 200, "response_time_in_ms": 50 }
{"index":{"_index":"stats_aggs"}}
{"url": "http://example.com/1", "status_code": 200, "response_time_in_ms": 25 }
{"index":{"_index":"stats_aggs"}}
{"url": "http://example.com/1", "status_code": 200, "response_time_in_ms": 30 }
{"index":{"_index":"stats_aggs"}}
{"url": "http://example.com/1", "status_code": 200, "response_time_in_ms": 100 }
{"index":{"_index":"stats_aggs"}}
{"url": "http://example.com/1", "status_code": 200, "response_time_in_ms": 5 }
{"index":{"_index":"stats_aggs"}}
{"url": "http://example.com/1", "status_code": 200, "response_time_in_ms": 15 }
{"index":{"_index":"stats_aggs"}}
{"url": "http://example.com/1", "status_code": 200, "response_time_in_ms": 18 }
{"index":{"_index":"stats_aggs"}}
{"url": "http://example.com/1", "status_code": 200, "response_time_in_ms": 25 }
{"index":{"_index":"stats_aggs"}}
{"url": "http://example.com/redirect", "status_code": 302, "response_time_in_ms": 25 }
{"index":{"_index":"stats_aggs"}}
{"url": "http://example.com/2", "status_code": 201, "response_time_in_ms": 25 }
{"index":{"_index":"stats_aggs"}}
{"url": "http://example.com/2", "status_code": 201, "response_time_in_ms": 35 }
{"index":{"_index":"stats_aggs"}}
{"url": "http://example.com/2", "status_code": 201, "response_time_in_ms": 12 }
{"index":{"_index":"stats_aggs"}}
{"url": "http://example.com/2", "status_code": 500 }
{"index":{"_index":"stats_aggs"}}
{"url": "http://example.com/2", "status_code": 500 }
'

Exercise

✅ Build a stats aggregation query on field response_time_in_ms.

A solution

The following query uses a stats aggregation named response_stats.

curl -X POST 'http://localhost:9200/stats_aggs/_search?pretty' -H 'Content-Type: application/json' -d '{
  "size": 0,
  "aggs": {
    "response_stats": {
      "stats": {
        "field": "response_time_in_ms"
      }
    }
  }
}'

What are good uses cases for the stats aggregation?

In the same manner there exists an extended stats aggregation, see the Elasticsearch documentation on extended stats aggregation. It's basically the same but provides even more statistics on the numeric field.

✅ Build the same aggregation using extended_stats on field response_time_in_ms.

A solution

The following query uses a extended_stats aggregation named response_stats.

curl -X POST 'http://localhost:9200/stats_aggs/_search?pretty' -H 'Content-Type: application/json' -d '{
  "size": 0,
  "aggs": {
    "response_stats": {
      "extended_stats": {
        "field": "response_time_in_ms"
      }
    }
  }
}'

This contains further statistical information, such as variance, standard deviation.

Depending on the Elasticsearch version the extended_stats aggregation can return different fields (ES 7.8 vs 7.9), therefore check the documentation.