Avoids putting two copies of the same shard on nodes with the same attribute (e.g. rack, availability zone). For example:
node.attr.availability_zone: us-east1 # in elasticsearch.yml
Awareness is enabled at the cluster level:
curl -XPUT localhost:9200/_cluster/settings?pretty -d '{
"persistent" : {
"cluster.routing.allocation.awareness.attributes" : "availability_zone"
}
}'
Shards of an index can prefer/avoid nodes with certain attributes. Good for having hot/cold tiers: node.attr.temperature: hot # in elasticsearch.yml
At index creation, you can assign shards to the hot nodes:
curl -XPUT localhost:9200/logs01 -d '{
"settings": {
"index.routing.allocation.include.tag": "hot"
}
}'
Later on, you can change this value to cold move the shards to nodes having temperature set to cold.
Avoids the domino effect of relocation when a node is restarted or temporarily unavailable:
curl -XPUT localhost:9200/$INDEX/_settings -d '{
"settings": {
"index.unassigned.node_left.delayed_timeout": "5m"
}
}'
Defaults to 10% of heap:
indices.queries.cache.size: 7% # in elasticsearch.yml
By default, queries running in the filter context will be cached if they run repeatedly, and only on larger segments. You can override this and cache everything in elasticsearch.yml:
index.queries.cache.everything: true
Caches results of aggregations on indices that haven’t changed. Defaults to 1% of heap:
indices.requests.cache.size: 2%
A node-level buffer for indexing, before a flush will commit to disk. Defaults to 10% of heap:
indices.memory.index_buffer_size: 5%
Big arrays used by aggregations are put here so they can be reused. Defaults to 10% of heap:
cache.recycler.page.limit.heap: 5%
The only way to do sorting/aggregations on text fields. Avoid it if possible. If not, limit it through per-request circuit breakers:
indices.breaker.fielddata.limit: 10%
And by limiting the overall size:
indices.fielddata.cache.size: 20%
Might be worth merging indices that don’t change into a handful of big segments:
curl -XPOST localhost:9200/$INDEX/_forcemerge?max_num_segments=5
Segments need to be merged in order to change the compression level, so you can do that before force merging:
curl -XPUT localhost:9200/$INDEX/_settings -d '{
"index.codec": "best_compression"
}'
size = number of parallel requests, and queue_size = number of waiting requests:
threadpool.search.size: 8
threadpool.search.queue_size: 5000
threadpool.bulk.size: 12
threadpool.bulk.queue_size: 500
There are multiple knobs here. Most importantly: Segments per tier. Defaults to 10. Higher values allow for more segments, giving better indexing throughput at the expense of search latency, disk space, memory and open file handles Max merge at once. Defaults to 10. Lower values lower the impact of merging, but will make the process slower (which can potentially throttle indexing) Max merged segment. Defaults to 5GB. Lower values result in less merges of large segments, but require more merges of small segments, trading spikes for overall load.
curl -XPUT localhost:9200/$INDEX/_settings -d '{
"index.merge.policy": {
"segments_per_tier": 50,
"max_merge_at_once": 50,
"max_merged_segment": "1gb"
}
}'
Shrink an index into a new one with less shards (factor of the current number of shards):
curl -XPOST localhost:9200/logs01/_shrink/logs01_shrinked -d '{
"settings": {
"index.number_of_shards": 1
}
}'
curl localhost:9200/_cat/health?v
v is for “verbose”, shows column headers. Gives number of nodes, shards (started, initializing, relocating) and cluster color:
- All primaries and replicas are up
- All primaries are up, but not all replicas
- Not all primaries are up
curl localhost:9200/_cat/nodes?v
Shows figures like load and heap usage of nodes. You can select columns via the help parameter to get other metrics.
How many shards are on each node and how much disk space they take (vs free space).
How big is each index; how many shards and replicas it has.
How big is each shard and on which node it is. Shows whether a shard is STARTED, UNASSIGNED, INITIALIZING or RELOCATING. You can easily grep though those values when you have many shards
How big each segment in each shard is (including memory usage). You can filter by index, for example:
curl localhost:9200/_cat/segments/$INDEX?v
If you look at the files, you’ll see different extensions. Most importantly (in terms of memory and storage):
- .cfs, .cfe: These are compound segments
- .fdt: Stored fields (like _source)
- .tim: Term dictionary, used when searching in indexed fields
- .doc: Frequency of each term in each document (for scoring)
- .pos: Positional information (for phrase searches)
- .pay: Payloads, most notably character offsets (for the Postings-based highlighters)
- .nvd, .nvm: Field lengths (a.k.a. norms - also used for storing)
- .dvd, .dvm: Doc values (used for sorting and aggregations)
- .tv?: Term vectors (used for the term-vector-based highlighters)
- .dii, .dim: Point values (for geo fields as well as numerics)
More information can be found here (for Elasticsearch 5.x, which uses Lucene 6.x, you may need to change the version): https://lucene.apache.org/core/6_0_0/core/org/apache/lucene/codecs/lucene60/package-summary.html
In-progress operations in your cluster. You’d typically catch long-running ones (e.g. snapshot, force merge) or the ones that get queued up when the cluster is in trouble and the master gets overloaded (e.g. lots of mapping/cluster state updates). _cat/thread_pool How many threads are active (working on) searches, bulk indexing and so on. You can also see how many are enqueued (queue) compared to the queue.size and how many were rejected (usually because the queue was full).
How much heap field data (the in memory equivalent of doc values) takes. Per field, per node.
curl localhost:9200/_nodes/stats?pretty
Gives back statistics of all nodes in the cluster. You can filter nodes, too, like _nodes/_local/stats just for the current node. Relevant metrics include:
- How much time was spent in queries, fetches, indexing, merging, etc
- How much memory current segments take, broken by type (e.g. term dictionary, doc values) which is a good indicator of the live set
- Current and maximum amount of heap usage per pool. Good indicator of
curl localhost:9200/_nodes/hot_threads
Tells you what’s keeping Elasticsearch busy. Add type=wait or type=block to see what’s keeping it from being busy. You can also filter nodes like _nodes/_local/hot_threads
top, iotop, dstat, iostat help figure out what the bottleneck is. Usually:
- Aggregations are CPU-intensive and memory-intensive. The last part may translate into high GC (check the logs for longer GC events)
- Full-text search (without aggregations) is IO latency sensitive
- Indexing (especially merging) is CPU intensive and IO throughput intensive
- Snapshots, replication and replication are network and disk intensive
Shows all the decisions that make a particular shard not to be allocated on different nodes:
curl localhost:9200/_cluster/allocation/explain?pretty -d'{
"index": "INDEX_NAME",
"shard": 0,
"primary": true
}'
Also accepts the node name as a node value in the body to show the explanation only for it.
curl localhost:9200/$INDEX/_shard_stores?pretty
Returns the last exception that occurred while opening shards of this index.
How many shards an index can have on each node (good for force-balancing the cluster):
curl -XPUT localhost:9200/$INDEX/_settings -d '{
"index.routing.allocation.total_shards_per_node": 2
}'
Prevents nodes from running out of disk.
Low watermark: when to stop allocating new shards. High watermark: when to relocate existing shards.
curl -XPUT localhost:9200/_cluster/settings -d '{
"persistent" : {
"cluster.routing.allocation.disk.watermark.low" : "70%",
"cluster.routing.allocation.disk.watermark.high" : "85%"
}
}'
Allows you to try and allocate a shard manually, or cancel a replication/relocation, or to move a shard:
curl -XPOST localhost:9200/_cluster/reroute -d '{
"commands" : [ {
"move" :
{
"index" : "INDEX_NAME", "shard" : SHARD_NUMBER,
"from_node" : "SOURCE_NODE", "to_node" : "DESTINATION_NODE"
}
}
]
}'
How many shards can be replicated from each node:
curl -XPUT localhost:9200/_cluster/settings?pretty -d '{
"persistent" : {
"cluster.routing.allocation.node_concurrent_recoveries": 2
}
}'
How many shards can move around, cluster-wide:
curl -XPUT localhost:9200/_cluster/settings?pretty -d '{
"persistent" : {
"cluster.routing.allocation.cluster_concurrent_rebalance": 2
}
}'
How much bandwidth can recovery/rebalancing take:
curl -XPUT localhost:9200/_cluster/settings?pretty -d '{
"persistent" : {
"indices.recovery.max_bytes_per_sec": "20mb"
}
}'
Trade durability for performance (less IOPS):
curl -XPUT localhost:9200/$INDEX/_settings -d '{
"index.translog": {
"index.translog.durability": "async"
}
}'
If survivor space is mostly full, you can increase it by lowering -XX:SurvivorRatio in jvm.options (default is 8 on most platforms).
If the whole young generation (survivor + eden) is mostly full, you can increase it via -XX:NewSize.
On large heaps (>30GB, usually you’d want to stay under 30GB to get compressed pointers, but 60-90GB may be needed on some big boxes), using G1 instead of CMS should help. To do that, replace:
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
With:
-XX:+UseG1GC
Quick way to free some heap:
curl -XPOST localhost:9200/_cache/clear