Skip to content

Best practices for index generation and storage

flathers edited this page Oct 10, 2018 · 3 revisions

General concerns

Issues to consider

  • Index segmentation. There seems to be a common practice of many smaller indices with queries that operate across them (e.g. index prefix with a wildcard for the index name)

  • Suggested number of shards to be deployed

    • The challenge is to figure out the right number of shards, because you only get to make the decision once per index. The challenge is to figure out the right number of shards, because you only get to make the decision once per index. The Elastic team recommends starting with one shard, sending “realistic” traffic, and seeing where it breaks. Then add more shards and retest until you find the right number. The key is to pick some kind of a timescale. You will eventually have to reshard; the only question is when.
  • Hardware suggestions, e.g. single or multiple VMs, memory, disk, CPU balance

    • Nothing yet

Outline some key best practices for index creation and management.

What is the correct procedure for defining an index so that field types are optimally defined?

    indexconfig = {
    "mappings": {
      "apacheLine" : {
        "properties": {
          "sessionid": {"type": "long"},
          "searchevent": {"type": "boolean"}
        }
      }
    }
  }
  ES.indices.create(indexname, indexconfig)

Maintenance

Outline of current process

Apache Logs

  • Any information we should know about the Apache logs?
  • Source machine(s)?
  • File-based data source for the following:

Filebeat

  • [Dave will provide details]
  • Polling frequency?

Logstash

  • The Logstash config files in order of execution:
    • 10-beats-input.conf

      Here's what it does
      

      and here's why

    • 20-searchlog-filter.conf

      Here's what it does
      

      and here's why

    • 21-eventlog-filter.conf

      Here's what it does
      

      and here's why

    • 30-elasticsearch-output.conf

      Here's what it does
      

      and here's why

Elasticsearch Index