Troubled by confusing concepts such as Executors, Node, RDD, Task in spark? Invest just 2 minutes of your time to make some order in this mess!
Executor => Multiple Tasks: is a JVM process sitting on all nodes. Executors receive tasks (jars with your code) deserialize it, and run it as a task.
Executors utilize cache so that the tasks can run faster.
Node => Multiple Executors: Each node has multiple executors.
RDD => Big DataStructure: Its main strength is that it represents data which cannot be stored on a single machine, so its data is distributed, partitioned, split across computers.
Input => RDD: Every RDD is born out of some input like a text file, hadoop files etc.
Output => RDD: The output of functions in spark can produce an RDD. So it's like one function after another each receives an input RDD and outputs an output RDD, it's functional.
RDD[Type, Type] : RDD's are typed, they are data of a certain type.
RDD => 1,2,3: RDD's are ordered.
RDD => Zzzz: RDD's are lazily evaluated. We said functional, didn't we? so you have multiple transformations on your data and only when you hit an action you need the actual data.
RDD => Partitioned: RDD's are partitioned between servers, we said it's big data so we need to partition it.
RDD => Array(thing1, thing2, thing3) : You can think of RDD's as a bunch of things.
Guys if you have any other mess and want me to cheatsheet something for you just comment below, also I would highly appreciate any comment's about this post please feedback me!
I'll clean up these apache spark concepts for you!
Spark building blocks: executor,tasks,cache,sparkcontext,cluster manager
Executors utilize cache so that the tasks can run faster.
Node => Multiple Executors: Each node has multiple executors.
RDD => Big DataStructure: Its main strength is that it represents data which cannot be stored on a single machine, so its data is distributed, partitioned, split across computers.
Input => RDD: Every RDD is born out of some input like a text file, hadoop files etc.
Output => RDD: The output of functions in spark can produce an RDD. So it's like one function after another each receives an input RDD and outputs an output RDD, it's functional.
RDD[Type, Type] : RDD's are typed, they are data of a certain type.
RDD => 1,2,3: RDD's are ordered.
RDD => Zzzz: RDD's are lazily evaluated. We said functional, didn't we? so you have multiple transformations on your data and only when you hit an action you need the actual data.
RDD => Partitioned: RDD's are partitioned between servers, we said it's big data so we need to partition it.
RDD => Array(thing1, thing2, thing3) : You can think of RDD's as a bunch of things.
Guys if you have any other mess and want me to cheatsheet something for you just comment below, also I would highly appreciate any comment's about this post please feedback me!
Very Informative...Glad to find your blog...Keep Sharing...
ReplyDeleteTESTING & TRAINING ON SELENIUM
ORACLE TRAINING IN CHENNAI
PYTHON TRAINING IN CHENNAI
DATA SCIENCE TRAINING IN CHENNAI
ONLINE INTERNSHIPS
ReplyDeleteWINTER INTERNSHIPS FOR ENGINEERING
SUMMER INTERNSHIP
SUMMER INTERNSHIP IN CHENNAI
WINTER INTERNSHIP IN CHENNAI
INTERNSHIP IN CHENNAI
INTERNSHIP
INTERNSHIPS
IT INTERNSHIP IN CHENNAI
Thanks for the information. The one thing I have noticed in this website is that you were continuously updating the changes that you have been made. It is a good sign to attract more people and I appreciate it. Hope more update and news from you.
ReplyDeleteOracle Training | Online Course | Certification in chennai | Oracle Training | Online Course | Certification in bangalore | Oracle Training | Online Course | Certification in hyderabad | Oracle Training | Online Course | Certification in pune | Oracle Training | Online Course | Certification in coimbatore
It is actually a great and helpful piece of information. I am satisfied that you simply shared this helpful information with us. Please stay us informed like this. Thanks for sharing.
ReplyDeleteOracle Training | Online Course | Certification in chennai | Oracle Training | Online Course | Certification in bangalore | Oracle Training | Online Course | Certification in hyderabad | Oracle Training | Online Course | Certification in pune | Oracle Training | Online Course | Certification in coimbatore
Really it was an awesome article… very interesting to read…. oracle training in chennai
ReplyDeleteDamien Grant
ReplyDeleteDamien Grant
Damien Grant
Damien Grant
Damien Grant
Damien Grant
Damien Grant
Damien Grant
Damien Grant
ReplyDeleteDamien Grant
Damien Grant
Damien Grant
Damien Grant
Damien Grant
Damien Grant
Damien Grant