Skip to main content

Apache Spark Components CheatSheet

Troubled by confusing concepts such as Executors, Node, RDD, Task in spark? Invest just 2 minutes of your time to make some order in this mess!


I'll clean up these apache spark concepts for you!

Spark building blocks: executor,tasks,cache,sparkcontext,cluster manager


Executor => Multiple Tasks: is a JVM process sitting on all nodes.  Executors receive tasks (jars with your code) deserialize it, and run it as a task.

Executors utilize cache so that the tasks can run faster.

Node => Multiple Executors: Each node has multiple executors.

RDD => Big DataStructure: Its main strength is that it represents data which cannot be stored on a single machine, so its data is distributed, partitioned, split across computers.

Input => RDD: Every RDD is born out of some input like a text file, hadoop files etc.

Output => RDD: The output of functions in spark can produce an RDD.  So it's like one function after another each receives an input RDD and outputs an output RDD, it's functional.

RDD[Type, Type] : RDD's are typed, they are data of a certain type.

RDD => 1,2,3: RDD's are ordered.

RDD => Zzzz: RDD's are lazily evaluated.  We said functional, didn't we? so you have multiple transformations on your data and only when you hit an action you need the actual data.

RDD => Partitioned: RDD's are partitioned between servers, we said it's big data so we need to partition it.

RDD => Array(thing1, thing2, thing3) : You can think of RDD's as a bunch of things.

Guys if you have any other mess and want me to cheatsheet something for you just comment below, also I would highly appreciate any comment's about this post please feedback me!

Comments

  1. Thanks for the information. The one thing I have noticed in this website is that you were continuously updating the changes that you have been made. It is a good sign to attract more people and I appreciate it. Hope more update and news from you.
    Oracle Training | Online Course | Certification in chennai | Oracle Training | Online Course | Certification in bangalore | Oracle Training | Online Course | Certification in hyderabad | Oracle Training | Online Course | Certification in pune | Oracle Training | Online Course | Certification in coimbatore

    ReplyDelete
  2. Really it was an awesome article… very interesting to read…. oracle training in chennai

    ReplyDelete

Post a comment

Popular posts from this blog

API Design Paper Summary and Review

API Design Paper Summary Introduction Is building API a solvable question, how far can we get into having good API’s and what is a good API at all? these are all very hard questions, usually you know the answers once you designed multiple APIs and got experience and then reviewed the decisions you have taken. Fortunately there are papers dealing with this problem exactly, for complex API’s used by a huge amount of people, the Qt API for example a very populate framework for desktop GUI building, and today we are going to go through a summary of that paper.

“The Little Manual of API Design” is a very nice paper written by Jasmin Blanchette has released a paper while working in trolltech, which is a Nokia company. I found it to be very clear and concise, and reassuring what we think of API design. It’s a difficult task that includes both artistic, social, programming and scientific skills. We are going to summarize this paper for you.

When you write an API you combine a set of symb…

Dev OnCall Patterns

IntroductionBeing On-Call is not easy. So does writing software. Being On-Call is not just a magic solution, anyone who has been On-Call can tell you that, it's a stressful, you could be woken up at the middle of the night, and be undress stress, there are way's to mitigate that. White having software developers as On-Calls has its benefits, in order to preserve the benefits you should take special measurements in order to mitigate the stress and lack of sleep missing work-life balance that comes along with it. Many software developers can tell you that even if they were not being contacted the thought of being available 24/7 had its toll on them. But on the contrary a software developer who is an On-Call's gains many insights into troubleshooting, responsibility and deeper understanding of the code that he and his peers wrote. Being an On-Call all has become a natural part of software development. Please note I do not call software development software engineering because …

Recursion Trees Primer

Recursion trees.

Controlling the fundamentals stands at the cornerstone of controlling a topic.  In our case in order to be a good developer its not enough or even not at all important to control the latest Java/JavaScript/big data technology but what's really important is the basics.  And the basics in computer science are maths, stats, algorithms and computer structure.

Steve wosniak the co-founder of apple said the same, what gave him his relative advantage was his deep understanding of programming and computer structure, this is what gave him the ability to create computer's which are less costly than the competitors (not that there were many) and by the way there were 3 founders to apple company one responsible for the technical side, one for the product and sales (Steve Jobs) and the third responsible for the company structure and growth, each of the three extremely important, it was not only the two Steve's but that's a topic for another episode.

And with that l…