Skip to main content

Hadoop CheatSheet

Introduction
We have decided to aggregate in a single post the most important things to know about hadoop in a concise way. Let’s us know if you have any comments!


Hadoop

##########
## HDFS ##
##########

NameNode # => Managing filesystem namespace, if you loose it you have no pointers to your data, you practially lost your data.

DataNode # => You know it holds data, installed on each worker.

Block # => Each file split to B1,B2,.. where each block size 128MB replication is on blocks.  Name node knows that File X is split to B1,B2 and where.

##########
## YARN ##
##########

ResourceManager # => Like `NameNode` for computing, tracks NodeManagers and how available they are for work.

NodeManager # => Like `Datanode` for computing, offer computational resources run applications tasks in containers.

ApplicationMaster # => Each application has `ApplicationMaster` process which negotiates resources with `ResourceManager` which delivers a `container` descriptor back to `ApplicationMaster` processa and asks `NodeManager` to launch the `container.`
 
 ################
 ## Map Reduce ##
 ################
 
 Map(k1, v1) --> list(k2, v2) # => map takes keyvalue pair and produces zero or more intermediate keyvalue pairs
 
 Recduce(k2, list(v2)) --> list(k3, v3) # => Reduce take a single key and list of values and produces zero or more keyvalue, usually aggregation.

Summary

We kept that post small so you could rest :), but in general we went through what NameNode, DataNode, Block, ResourceManager, NodeManager, ApplicationMaster, Task are in a very short and concise way, isn’t that just great :) If you liked it please hit the share button below and leave a comment for any comment! :)




Comments

Post a Comment

Popular posts from this blog

Dev OnCall Patterns

Introduction Being On-Call is not easy. So does writing software. Being On-Call is not just a magic solution, anyone who has been On-Call can tell you that, it's a stressful, you could be woken up at the middle of the night, and be undress stress, there are way's to mitigate that. White having software developers as On-Calls has its benefits, in order to preserve the benefits you should take special measurements in order to mitigate the stress and lack of sleep missing work-life balance that comes along with it. Many software developers can tell you that even if they were not being contacted the thought of being available 24/7 had its toll on them. But on the contrary a software developer who is an On-Call's gains many insights into troubleshooting, responsibility and deeper understanding of the code that he and his peers wrote. Being an On-Call all has become a natural part of software development. Please note I do not call software development software engineering b

Containers - Quick Low Level Guide

Containers Kernel, namespace, cgroups Kernel space and user space Before we actually get to explain containers let's define what is a kernel.  Because you know there is no such thing in reality as a kernel it's only how we name things, and different people name things differently. cgroups, namespaces, UFS We are going to discuss containers, cgroups, namespace, UFS, hypervisor, user space, kernetl space and more.   When we say "kernel" we mean this.  We have the hardware, this is not the kernel, now above the hardware we have a few layers of software, imagine now two boxes. User mode is all the application you run while the kernel is the lower level is all the virtual memory management scheduling, connection to hardware devices, network drivers, it's basically the abstraction on top of the hardware + the basic services which allow this. One box is closer to the hardware and contains a few layers, the second box sits on top of the kernel box and contains

Recursion Trees Primer

Recursion trees. Controlling the fundamentals stands at the cornerstone of controlling a topic.  In our case in order to be a good developer its not enough or even not at all important to control the latest Java/JavaScript/big data technology but what's really important is the basics.  And the basics in computer science are maths, stats, algorithms and computer structure. Steve wosniak the co-founder of apple said the same, what gave him his relative advantage was his deep understanding of programming and computer structure, this is what gave him the ability to create computer's which are less costly than the competitors (not that there were many) and by the way there were 3 founders to apple company one responsible for the technical side, one for the product and sales (Steve Jobs) and the third responsible for the company structure and growth, each of the three extremely important, it was not only the two Steve's but that's a topic for another episode. And with t