background

January 6, 2021

A faster Docker for ElasticSearch

Yireo Blog Post

In the last couple of years, my development stack has become larger: MySQL and PHP were always in there, just like Redis. But - since I'm playing with Magento 2.4, Shopware 6 and Vue Storefront - ElasticSearch has been added to that stack as well. To guarantee my productivity, I tried to make it as fast as possible. Here's my setup.

... in Docker

First of all, I'm using Docker for this. Some people love Docker, some people hate it. But it seems as if the largest issues with Docker are involving the file sync between the application at hand (Magento, Shopware) and the Docker container. With ElasticSearch, this problem does not exist - you don't need to have file sync. You simply want ElasticSearch to be there, at your disposal.

I'm personally using a hybrid development environment: I'm running Apache 2.4 with mod_php (PHP 7.4) and MySQL 5.7 natively. But I also have Docker instances for MySQL 8, PHP 7.3 and PHP 8 lying around. For me, the versioning switching is the biggest win of Docker: Whenever I need ElasticSearch 6, 7 or 8, I simply fire up the relevant Docker instance (via a script) and I'm done.

Meet my Docker config

Tuning ElasticSearch therefore boils down to tuning Docker in my case. I've got a monster of a local machine with 16 CPUs (I don't know, the output of /proc/cpuinfo) and 64Gb RAM. Still, I want it to be faster. So one thing I always tend to do is to study how much memory I can assign to a specific application for it to run as fast as possible.

First of all, here's my Docker command:

docker run \
    --rm -it -d \
    -p 9200:9200 \
    -e "discovery.type=single-node" \
    -e "path.data=/opt/elasticsearch/volatile/data" \
    -e "path.logs=/opt/elasticsearch/volatile/logs" \
    -e "ES_JAVA_OPTS=-Xms128m -Xmx128m -server" \
    -e "ES_HEAP_SIZE=128m" \
    -e "MAX_LOCKED_MEMORY=100000" \
    --cpus=1 \
    --memory=512m \
    --memory-swappiness=0 \
    --memory-swap=0 \
    --tmpfs /opt/elasticsearch/volatile/data:rw \
    --tmpfs /opt/elasticsearch/volatile/logs:rw \
    --tmpfs /tmp:rw \
    elasticsearch:6.8.0

I've removed a couple of things that are special in my case (name, network, IP), so we can discuss the more relevant parts. The Docker instance is fired up with the --rm flag. This means that every time when the ElasticSearch container is started, it is empty. For me, this is the best setup: I work with Magento 2, Shopware 6 and VSF1 - totalling up a dozen development sites, plus numerous educational sites. And to initialize this fresh ES instance with data, I simply need to kickstart the indexer of the application at hand.

Tuning ElasticSearch

The first step is to initialize ES with the flag discovery.type=single-node, so that clustering is disable. I'll comment on the cluster health in the end of this article. But setting this flag is something that is documented very well and works fine.

However, the memory bit of ElasticSearch is less straight-forward: ES is able to use quite a bit of memory (obviously depending on the amount of data that needs to be indexed). So tuning Java flags -Xms and -Xmx is a good thing. An undocumented flag is -server - it is supposed to give some performance, I don't know, I'm just using it. The same memory amount should be set in ES_HEAP_SIZE. The larger the catalog you are trying to index, the higher this memory needs to be. Theoretically, with my 64Gb monster machine, I can consume more memory, but I like it to be tight.

The memory in Docker is also capped at 512Mb. I'm experimenting a bit with how many CPUs are optimal. And the swap is disabled.

Tmpfs filesystem

Now the thing that makes this ES instance fly is the usage of tmpfs (just like I wrote about earlier with MySQL and integration tests). Both the data and the logs folder are reconfigured to point to a specific path (just to avoid confusion) so that that path is mapped into memory using tmpfs.

The Docker instance is removed anyway when shut down (--rm). But whatever persistence Docker normally would deal with, is now wiped out by using tmpfs. The tmp folder is also mapped to tmpfs, but this is more aiding Linux than it is aiding ES.

Fixing the cluster health

Now, with the configuration mentioned above, you will see that ES is pretty fast. However, if you request the health from ES (curl http://localhost:9200/indices) then the cluster health is always orange. Does that mean that the ES server is in trouble from the very start? No, it means that the default ES configuration still assumes a server, while we are running a single server.

In the same script that I used to run the Docker command, I sleep for 30 seconds and then fire up the following command:

curl -XPOST 'http://localhost:9200/_template/default' -H 'Content-Type: application/json' \
    -d '{
  "index_patterns": ["*"],
  "order": -1,
  "settings": {
    "refresh_interval": "30s",
    "number_of_shards": "1",
    "number_of_replicas": "0"
  }
}'

This modifies the number of shards and replica shards for all indices to be created afterwards: No replicas means that ES does not try to duplicate data across a cluster. And by keeping the number of shards at 1, we basically tell ES that no shard tuning is needed either. I also move the refresh_interval from a default 1 seconds to 30 seconds, so that data is only synced from memory to the disk (which is by-the-by also memory) after 30 seconds. Nice.

Conclusion

Somewhere in the future, I will also try to make all of this work with CI/CD environments like with GitHub Actions, so that integration tests in the cloud are also running smooth as well. Earlier, I've always found it a bit ridiculous that Magento 2.4 is forcing ElasticSearch upon you, but by using Docker, it is quickly fixed (in development at least). I hope you find this tuning story useful.

Posted on January 6, 2021

This is just a small part of the huge knowledge that Yireo is to share with you. Check out our training portfolio to see how you can learn the most.

Read more

About the author

Author Jisse Reitsma

Jisse Reitsma is the founder of Yireo, extension developer, developer trainer and 3x Magento Master. His passion is for technology and open source. And he loves talking as well.

Sponsor Yireo

Looking for a training in-house?

Let's get to it!

We schrijven niet te commerciële dingen, we richten ons op de technologie (waar we dol op zijn) en we komen regelmatig met innovatieve oplossingen. Via onze nieuwsbrief kun je op de hoogte blijven van al deze coolness. Inschrijven kost maar een paar seconden.

Do not miss out on what we say

This will be the most interesting spam you have ever read

We schrijven niet te commerciële dingen, we richten ons op de technologie (waar we dol op zijn) en we komen regelmatig met innovatieve oplossingen. Via onze nieuwsbrief kun je op de hoogte blijven van al deze coolness. Inschrijven kost maar een paar seconden.