Skip to main content

Scalability CheatSheet - Paxos

Scalability CheatSheet— Part 3 — PAXOS

We like journaling, seriously, it helps us avoid data corruption you could update a data and fail really — i mean there could be an electricity shutdown whatever, this is why we like journaling it’s append only, so nothing can really be corrupted except for what you append, but if it’s corrupted you don’t consider it as appended.

Reading is now difficult you need to read all your journaling, so from time to time you create a snapshot of state, you see so now you have a snapshot and you augment it with your appended only jouranling.

Just Read — When you read you just read, you don’t lock, you don’t care about the world, it’s like you are high you just read and reading now does not disturb any writes just read without disturbing writes, its immutable, all is cool dude, we are reading.

Collision — What happens if two distributed machines try to write at the very same time same timestamp and different content on same key? OMG we just have a problem even in append only journaling, let’s scratch our heads and see what we can do, calling turing might be a good idea, but he didn’t answer any of my calls for the last couple of years, so we’ll have to figure something out.

Eventually Consistent — Less is the new more — A secret weapon we came upon which basically means, we can’t have the best of all world and we don’t need to. For our cases we are going to satisfy with less. As each participant has its own local strongly consistent store, they update each other about.. updates. With eventual consistency you build a map with all nodes and update others when something happens and can change route no one node has precedence over others who has same data. — 
The non real last state — so let us stress it again it is impossible for anyone to know the current and we are talking here on the real state, impossible to do proper read-modify-write, that is, with the top most update to date correct information without collisions, and in addition its vulnerable to network failures — but we already agreed, less is more! And the more here is that it’s much more 
Paxos strong consistency - The whole target of paxos is to reach a consensus among members.

The Part Time Parliament — Quorum —  Paxos is doing it in manner of reaching an agreement between chosen members, for example you need at least half the nodes to agree on it. The Paxos uses the part time parliament example — in order to make a change you have to get the agreement of the majority of the paxons. Now you cannot have two set of paxons both larger than half the group having the same decision on change . 
Paxos is also for reading — likewise when you need to read something you want to know that you get the latest and greatest revision, so you need to get majority of the paxons to agree on latest revision
So in paxos read the client asks all nodes for a value, a valid answer is when majority of all nodes agrees on value, yes of all nodes at least in the naive algorithm. No canonical place to store answer. This is naive paxos there are better.
Paxos Write — (ask → promise → accept) — here the client contacts a random node, asks it to write value, the node take the value and appends sequence number and does prepare (value, seq), all receiving nodes make sure seqnum is highest to accept a proposal, if the client would send proposal immediately without first contacting a node then if two clients do that each could end up with half the system agreeing. 
Paxos has a way to generate only growing numbers with time, then if its the highest nodes agree to accept — promise to accept. Now counting how many promises we have, if we have promises from more than half majority before timeout then asking all promises to accept. if only some nodes managed to accept value, which means that reads would not get majority and would fail it sucks but at least we are not at inconsistent state.
Paxos cost — The cost a write is requiring consensus among majority of members you can no longer just write to your local journal.

Master election — Master election is just yet another example of agreeing on something, in this case you agree on master, it’s an expensive, strongly consistent store used to decide who is in charge then if you need something you contact him, but once you all agree on a master it simplifies your processes when you need to read or write you just contact me, so it eliminates further agreements on every read and write.






Comments

Popular posts from this blog

Dev OnCall Patterns

Introduction Being On-Call is not easy. So does writing software. Being On-Call is not just a magic solution, anyone who has been On-Call can tell you that, it's a stressful, you could be woken up at the middle of the night, and be undress stress, there are way's to mitigate that. White having software developers as On-Calls has its benefits, in order to preserve the benefits you should take special measurements in order to mitigate the stress and lack of sleep missing work-life balance that comes along with it. Many software developers can tell you that even if they were not being contacted the thought of being available 24/7 had its toll on them. But on the contrary a software developer who is an On-Call's gains many insights into troubleshooting, responsibility and deeper understanding of the code that he and his peers wrote. Being an On-Call all has become a natural part of software development. Please note I do not call software development software engineering b

SQL Window functions (OVER, PARTITION_BY, ...)

Introduction When you run an SQL Query you select rows, but what if you want to have a summary per multiple rows, for example you want to get the top basketball for each country, in this case we don't only group by country, but we want also to get the top player for each of the country.  This means we want to group by country and then select the first player.  In standard SQL we do this with joining with same table, but we could also use partition by and windowing functions. For each row the window function is computed across the rows that fall into the same partition as the current row.  Window functions are permitted only in the  SELECT  list and the  ORDER BY  clause of the query They are forbidden elsewhere, such as in  GROUP BY ,  HAVING  and  WHERE  clauses. This is because they logically execute after the processing of those clauses Over, Partition By So in order to do a window we need this input: - How do we want to group the data which windows do we want to have? so  def c

Building Secure and Reliable Systems

A recent book was published this year by Google about site reliability and security engineering, I would like to provide you a brief overview of it and incorporate my own analysis and thoughts about this subject while saving you some time from reading, at least part of it. Take a few of your customers and ask them, what are the top 5 features on my product that you like.  The answer that you are likely to get is, I really like how polished the UI is, or the daily report I get by mail is just fantastic, or since I started using your product I was able to save one hour a day my productivity got up and the share /chat button on document that you added recently is doing a great job. Your customers are very unlikely to answer the question of what top 5 features of my product do you like with I really like its security or I really like that we lost no chat messages since I started using it.  No real customer will even think of it, moreover, assuming you did a very good job, they won&#