ElasticSearch is nowadays everywhere in the ecosystem of Magento (and Shopware slowly as well). With Magento 2.4, ElasticSearch was added to the system requirements. But even though numerous developers are using ElasticSearch in production and in development, it is quite common to see that the ElasticSearch cluster to be yellow in color. Is this ok?
What is yellow?
ElasticSearch is an application that can be managed via an HTTP API. This means that a simple request to http://localhost:9200
(assuming that's the right host and port) gives you a JSON reply with all kind of details.
There is also a health check endpoint http://localhost:9200/_cluster/health
which gives an output like the following:
{
"cluster_name" : "docker-cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 1,
"active_shards" : 1,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 1,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 50.0
}
Amongst various values, there is the status
which is in this case yellow
. You can also wait for the cluster status to go from red
or unknown
into yellow
by calling for the following URL http://localhost:9200/_cluster/health?wait_for_status=yellow&timeout=50s
.
However, there is also a color green
. And if you would ask me, green
is better than yellow
, right? Let's refer to the official manual for the meanings of these colors: On the shard level, a red
status indicates that the specific shard is not allocated in the cluster, yellow
means that the primary shard is allocated but replicas are not, and green
means that all shards are allocated. There you go.
Shards and replicas
In this blog, I was intending to write down an entire course on what shards are and how ElasticSearch is. But let me simplify things by saying that ElasticSearch is a name with 2 words in it: The search engine becomes elastic because of it's clustering capabilities. You could start of with a single ElasticSearch server.
But instead of increasing the resources on the server (CPU, memory, etc) you could also add another server to the ElasticSearch cluster. Hence the health of a specific ElasticSearch server is measured as part of the health of a cluster. Now, clustering comes with various benefits, like performance but also like fail-over. If one node dies, another takes over. But if all relevant data are saved on the first node only, then there is nothing to take over for the second node. To solve this, ElasticSearch uses replicas. One piece of searchable data (shard) is copied into another (replica).
But this only makes sense if you have multiple servers. Having replicas of shards while you have only one server is not making much sense.
Yellow, the new green
Most Magento environments I encounter are low-endish and therefore quite straightforward. Sometimes, all services (Nginx, PHP-FPM, MySQL, Redis and ElasticSearch) are combined on a single server. Sometimes, a separate MySQL server or ElasticSearch server are used. And if resources are short for MySQL and ElasticSearch, the hardware is upgraded. But only in large-scalish enterprises, you'll see multiple ElasticSearch nodes in a cluster. Only in that specific situation, replicas make sense.
Most ElasticSearch server configurations that I encounter are therefore single servers and their cluster status is always yellow. It is kind of accepted as a common thing. Question: Why is the cluster yellow? Answer: Oh, it's supposed to be like that. Yellow has become the new green.
Going green anyway
There is an alternative though: You can also tell ElasticSearch that there are not supposed to be any replicas. Unfortunately, the discovery.type=single-node
option (though it does save you performance because node discovery is disabled). Instead, you will need to update every index to change the default settings. Fortunately, you can also use the _template
API to change the template for every index that is created afterwards. For instance, this can be accomplished with the following CURL from the CLI:
curl -XPOST 'http://localhost:9200/_template/default' -H 'Content-Type: application/json' -d '{
"settings": {
"number_of_shards": "1",
"number_of_replicas": "0"
}
}
The number_of_shards
is set to 1, because splitting up data in multiple shards on a single server makes no sense. And number of replicas is set to 0, so effectively disabled. As I understood things from the manual, this does not improve performance so much. But as soon as the indexing starts, the cluster health at least goes from red
directly to green
. It's not easy being green.
About the author
Jisse Reitsma is the founder of Yireo, extension developer, developer trainer and 3x Magento Master. His passion is for technology and open source. And he loves talking as well.