ElasticSearch Cookbook
上QQ阅读APP看书,第一时间看更新

Setting up ElasticSearch for Linux systems (advanced)

If you are using a Linux system, typically on a server, you need to manage extra setup to have a performance gain or to resolve production problems with many indices.

Getting ready

You need a working ElasticSearch installation.

How to do it...

For improving the performance on Linux systems, we will perform the steps given as follows:

  1. First you need to change the current limit for the user who runs the ElasticSearch server. In these examples, we call the user as elasticsearch.
  2. To allow elasticsearch to manage a large number of files, you need to increment the number of file descriptors (number of files) that a user can have. To do so, you must edit your /etc/security/limits.conf and add the following lines at the end:
    elasticsearch       -       nofile          999999
    elasticsearch       -       memlock         unlimited

    Then a machine restart is required to be sure that changes are taken.

  3. For controlling the memory swapping, you need to set up this parameter in elasticsearch.yml:
    bootstrap.mlockall: true
  4. To fix the memory usage size of ElasticSearch server, we need to set up the same value ES_MIN_MEM and ES_MAX_MEM in $ES_HOME/bin/elasticsearch.in.sh. You can otherwise set up ES_HEAP_SIZE that automatically initializes ES_MIN_MEM and ES_MAX_MEM to same ES_HEAP_SIZE provided value.

How it works...

The standard limit of file descriptors (max number of open files for a user) is typically 1024. When you store a lot of records in several indices, you run out of file descriptors very quickly, so your ElasticSearch server becomes unresponsive and your indices may become corrupted, losing your data.

Changing the limit to a very high number means that your ElasticSearch doesn't hit the maximum number of open files.

The other settings for the memory prevent ElasticSearch from swapping the memory and give a performance boost in the production environment. These settings are required because during indexing and searching, ElasticSearch creates and destroys a lot of objects in memory. This large number of create/destroy actions fragments the memory, reducing the performances. If you don't set bootstrap.mlockall: true, ElasticSearch dumps the memory on disk and defragments it back in memory. With this setting, the defragmentation step is done in memory with huge performance boost.

There's more...

This recipe covers two common errors that happen in production:

  • "Too many open files", that can corrupt your indices and your data
  • Slow performance in search and indexing due to garbage collector