Skip to main content

Is ElasticSearch Set/Get Eventual Consistent?

Introduction
ElasticSearch does much of the heavy lifting on handling horizontal scalability for us, managing failures, nodes, shards. Now I was just getting into it a few days ago in a new project I was working at. I wanted to know if the SET/GET operation is eventually consistent or not. I started by thinking, well it's a nosql, there are replicas, it should be eventually consistent but then I read some documentation which leads me to interesting insights at least if you are the client writing the data in whether for you its going to be eventual consistent if you try to read from a replica. But first, allow me to summarize for you some of the concepts I have learned and then I will say what I think about SET/GET eventual consistency. (I also did a local cluster test to confirm that.)
cluster.name
Nodes with the same name belong to same cluster
The cluster reorganizes itself as we add or remove data, meaning it manages moving data between nodes if needed.
Master Node
He is not involved in searching - one node is elected as master node, it's going to be in charge of adding and deleting indexes, adding/removing nodes from cluster. He is just a manager.
Any Node
You can talk to any node for searching and indexing including the master. The entry point node (any node) knows where data resides so it will communicate with it to get and set data and it will get back to us (the entry point node) with the results.
Index
Logical namespace that points to one or more shards. It's like a database in a relational ddatabase. Index groups together mone or more shards.
1 index -> multi shards # => one index can have one or more multi shards it's like a database.
shard # => documents are stored in shards. single instance of lucene.  a complete search engine in it's own right.
application -> index -> shard # => applications talk to shards via indexes which are logical namespaces pointers to shards.
cluster grows # => move shards between nodes.
primary shard # => document is on a **single** primary shard.  data is only on one primary shard.
replica shard # => in case of hardware failure on primary shard, serve read requests (read/get).
number of shards # => you can have multiple primary shards for an index.
who handles what # => Read / Search is handled by either primary or replica, the more copies the higher the throughput.
concurrency # => if conflict two proesses read 50 and increase to one and store we can end up with 51 and not 52. elasticsearch is using optimistic concurrency control (versioning).
Distributed Document Store
When you index a document it is stored on a single primary shard.
shard = hash(routing) % number_of_primary_shards
This explains why the number of primary shards can be set only when an index is created and never changed: if the number of primary shards ever changed in the future, all previous routing values would be invalid and documents would never be found.
coordinating node # => the node got our request, forwards to correct node for read/write.
Create, index, and delete requests are write operations, which must be successfully completed on the **primary shard before ** they can be copied to any associated replica shards. The client will get OK only if finished successfully on primary shard.
parameters/configuration
replication # => sync: wait for successull response from replicas.  async: success as soon as primary finished.  avoid sync...
quorum # => By default primary shards requires a quorum (shards majority) to be **available** before attermting write.
read miss # => it is possible that while a document is indexed document is in primary but not yet copied to replica, replica will return that document does not exist, while the primary would return the document successfully.  in that sense read is not consistent but eventual consistent.
Now - Is ElasticSearch SET/GET Read Eventual consistent?
Elasticsearch read consistency is eventually consistent but it can also be consistent :). The realtime flag is per shard, so if we have a replicated shard which did not get the data yet, while it may still be realtime we won't get the most recent data, at most we would get the data on it's transaction log.
realtime:true + reaplication: sync ==> read consistent for same client # => because replication true means master waits for the written data to be replicated to all replicas.
How did I get to that conclusion? see the documentation:
replication The default value for replication is sync. This causes the primary shard to wait for successful responses from available replica shards before returning.
in addition it says:
By the time the client receives a successful response, the document change has been executed on the primary shard and on all replica shards. Your change is safe
Now the documentation also says this:
It is possible that, while a document is being indexed, the document will already be present on the primary shard but not yet copied to the replica shards. In this case, a replica might report that the document doesn’t exist, while the primary would have returned the document successfully. Once the indexing request has returned success to the user, the document will be available on the primary and all replica shards.
So it's possible for the document to be only on master and not replicas, well that makes sense, if we managed to set the document only on master and the replica didn't get it yet, but in this case the above section also said that the client would not get an ok response.
now there is also the realtime flag in the story:
The translog is also used to provide real-time CRUD. When you try to retrieve, update, or delete a document by ID, it first checks the translog for any recent changes before trying to retrieve the document from the relevant segment. This means that it always has access to the latest known version of the document, in real-time.
To the client which is waiting until data is replicated it is consistent, as the sync flag of the consistency is returning a success result to the client only after it was replicated. Together with the realtime flag this ensures that even if the operation is only in the transaction log, it would be returned to the client. but if i'm client2 which did not do the write, i might be just inside the operation where it finished on master and was not replicated yet to the replicas, in this case it would be eventual consistent. Ofcourse I encourage you to tell me if you think this is not the case :)

BOOK: If you are interested in more of developer oriented discussion on elasticseach and not just admin wise, then the best book I have found for it is: "ElasticSearch Essentials"


Comments

  1. Vampires in the Enchanted Castle casino - FilmFileEurope
    Vampires in the Enchanted Castle poormansguidetocasinogambling Casino. Vampires in the Enchanted Castle https://febcasino.com/review/merit-casino/ Casino. Vampires wooricasinos.info in the Enchanted Castle filmfileeurope.com Casino. Vampires in the Enchanted Castle Casino. febcasino.com Vampires in the Enchanted

    ReplyDelete

Post a Comment

Popular posts from this blog

Dev OnCall Patterns

Introduction Being On-Call is not easy. So does writing software. Being On-Call is not just a magic solution, anyone who has been On-Call can tell you that, it's a stressful, you could be woken up at the middle of the night, and be undress stress, there are way's to mitigate that. White having software developers as On-Calls has its benefits, in order to preserve the benefits you should take special measurements in order to mitigate the stress and lack of sleep missing work-life balance that comes along with it. Many software developers can tell you that even if they were not being contacted the thought of being available 24/7 had its toll on them. But on the contrary a software developer who is an On-Call's gains many insights into troubleshooting, responsibility and deeper understanding of the code that he and his peers wrote. Being an On-Call all has become a natural part of software development. Please note I do not call software development software engineering b

SQL Window functions (OVER, PARTITION_BY, ...)

Introduction When you run an SQL Query you select rows, but what if you want to have a summary per multiple rows, for example you want to get the top basketball for each country, in this case we don't only group by country, but we want also to get the top player for each of the country.  This means we want to group by country and then select the first player.  In standard SQL we do this with joining with same table, but we could also use partition by and windowing functions. For each row the window function is computed across the rows that fall into the same partition as the current row.  Window functions are permitted only in the  SELECT  list and the  ORDER BY  clause of the query They are forbidden elsewhere, such as in  GROUP BY ,  HAVING  and  WHERE  clauses. This is because they logically execute after the processing of those clauses Over, Partition By So in order to do a window we need this input: - How do we want to group the data which windows do we want to have? so  def c

Building Secure and Reliable Systems

A recent book was published this year by Google about site reliability and security engineering, I would like to provide you a brief overview of it and incorporate my own analysis and thoughts about this subject while saving you some time from reading, at least part of it. Take a few of your customers and ask them, what are the top 5 features on my product that you like.  The answer that you are likely to get is, I really like how polished the UI is, or the daily report I get by mail is just fantastic, or since I started using your product I was able to save one hour a day my productivity got up and the share /chat button on document that you added recently is doing a great job. Your customers are very unlikely to answer the question of what top 5 features of my product do you like with I really like its security or I really like that we lost no chat messages since I started using it.  No real customer will even think of it, moreover, assuming you did a very good job, they won&#