Skip to main content

Apache Spark Components CheatSheet

Troubled by confusing concepts such as Executors, Node, RDD, Task in spark? Invest just 2 minutes of your time to make some order in this mess!


I'll clean up these apache spark concepts for you!

Spark building blocks: executor,tasks,cache,sparkcontext,cluster manager


Executor => Multiple Tasks: is a JVM process sitting on all nodes.  Executors receive tasks (jars with your code) deserialize it, and run it as a task.

Executors utilize cache so that the tasks can run faster.

Node => Multiple Executors: Each node has multiple executors.

RDD => Big DataStructure: Its main strength is that it represents data which cannot be stored on a single machine, so its data is distributed, partitioned, split across computers.

Input => RDD: Every RDD is born out of some input like a text file, hadoop files etc.

Output => RDD: The output of functions in spark can produce an RDD.  So it's like one function after another each receives an input RDD and outputs an output RDD, it's functional.

RDD[Type, Type] : RDD's are typed, they are data of a certain type.

RDD => 1,2,3: RDD's are ordered.

RDD => Zzzz: RDD's are lazily evaluated.  We said functional, didn't we? so you have multiple transformations on your data and only when you hit an action you need the actual data.

RDD => Partitioned: RDD's are partitioned between servers, we said it's big data so we need to partition it.

RDD => Array(thing1, thing2, thing3) : You can think of RDD's as a bunch of things.

Guys if you have any other mess and want me to cheatsheet something for you just comment below, also I would highly appreciate any comment's about this post please feedback me!

Comments

  1. Thanks for the information. The one thing I have noticed in this website is that you were continuously updating the changes that you have been made. It is a good sign to attract more people and I appreciate it. Hope more update and news from you.
    Oracle Training | Online Course | Certification in chennai | Oracle Training | Online Course | Certification in bangalore | Oracle Training | Online Course | Certification in hyderabad | Oracle Training | Online Course | Certification in pune | Oracle Training | Online Course | Certification in coimbatore

    ReplyDelete
  2. Really it was an awesome article… very interesting to read…. oracle training in chennai

    ReplyDelete
  3. Sharing the same interest, Infycle feels so happy to share our detailed information about all these courses with you all! Do check them out
    Best Data training in chennai & get to know everything you want to about software trainings.

    ReplyDelete
  4. Infycle Technologies is the best software training center in Chennai and is widely known for its excellence in giving the best software training in Chennai. Providing quality software programming training with 100% assured placement & to build a strong career for every individual and young professionals in the software industry is the ultimate aim of Infycle Technologies. Apart from all, the students love the 100% practical training, which is the specialty of Infycle Technologies. To proceed with your career with a solid base, reach Infycle Technologies through 7502633633.Best Software Training Center in Chennai | Infycle Technologies

    ReplyDelete
  5. Very informative blog! I am glad that I came across your article. I'm learning a lot from here. Keep us updated by sharing more such blogs.
    AWS Course in Chennai
    AWS Online Course
    AWS Course in Coimbatore

    ReplyDelete
  6. Reach to the best Python Training institute in Chennai for skyrocketing your career, Infycle Technologies. It is the best Software Training & Placement institute in and around Chennai, that also gives the best placement training for personality tests, interview preparation, and mock interviews for leveling up the candidate's grades to a professional level.

    ReplyDelete
  7. Infycle Technologies, the top software training institute and placement center in Chennai offers the Best Digital Marketing Course in Chennai | Infycle Technologies for freshers, students, and tech professionals at the best offers. In addition to Digital Marketing, other in-demand courses such as DevOps, Data Science, Python, Selenium, Big Data, Java, Power BI, Oracle will also be trained with 100% practical classes. After the completion of training, the trainees will be sent for placement interviews in the top MNC's. Call 7504633633 to get more info and a free demo.

    ReplyDelete
  8. Infycle Technologies, the top software training institute and placement center in Chennai offers the Best Digital Marketing course in Chennai for freshers, students, and tech professionals at the best offers. In addition to Digital Marketing, other in-demand courses such as DevOps, Data Science, Python, Selenium, Big Data, Java, Power BI, Oracle will also be trained with 100% practical classes. After the completion of training, the trainees will be sent for placement interviews in the top MNC's. Call 7504633633 to get more info and a free demo.

    ReplyDelete
  9. Infycle Technologies, the No.1 software training institute in Chennai offers the Selenium course in Chennai for tech professionals, freshers, and students at the best offers. In addition to the Selenium, other in-demand courses such as Python, Big Data, Oracle, Java, Python, Power BI, Digital Marketing, Cyber Security also will be trained with hands-on practical classes. After the completion of training, the trainees will be sent for placement interviews in the top companies. Call 7504633633 to get more info and a free demo.

    ReplyDelete
  10. This post is so interactive and informative.keep update more information...
    AWS Training in Tnagar
    AWS Training in Chennai

    ReplyDelete

  11. This post is so interactive and informative.keep update more information...
    ccna Training in Tambaram
    ccna course in Chennai

    ReplyDelete
  12. Great post. keep sharing such a worthy information.
    QTP Online Training

    ReplyDelete
  13. This post is so interactive and informative.keep update more information...
    Android Training in Tambaram
    Android Training in Chennai

    ReplyDelete
  14. Red Gate .NET Reflector Crack is a program with which users can extract the source code for Windows programs and apply the required changes.Red Gate .NET Reflector Crack

    ReplyDelete
  15. Beyond Compare Key License Keygen fully lets key's the latest stage to give you various countenances for the same data format without .Beyond Compare Crack</

    ReplyDelete
  16. Surprise Quotes For Him our man, despite his gruff look, longs to be cherished and wanted by you, furthermore on hear that you just love him. Surprise Quotes For Him

    ReplyDelete


  17. This is a very well-written piece. Keep posting great things on your page. Your blog is wonderful.
    https://softkeygen.com/scrivener-crack-license-key/

    ReplyDelete

Post a Comment

Popular posts from this blog

Dev OnCall Patterns

Introduction Being On-Call is not easy. So does writing software. Being On-Call is not just a magic solution, anyone who has been On-Call can tell you that, it's a stressful, you could be woken up at the middle of the night, and be undress stress, there are way's to mitigate that. White having software developers as On-Calls has its benefits, in order to preserve the benefits you should take special measurements in order to mitigate the stress and lack of sleep missing work-life balance that comes along with it. Many software developers can tell you that even if they were not being contacted the thought of being available 24/7 had its toll on them. But on the contrary a software developer who is an On-Call's gains many insights into troubleshooting, responsibility and deeper understanding of the code that he and his peers wrote. Being an On-Call all has become a natural part of software development. Please note I do not call software development software engineering b

SQL Window functions (OVER, PARTITION_BY, ...)

Introduction When you run an SQL Query you select rows, but what if you want to have a summary per multiple rows, for example you want to get the top basketball for each country, in this case we don't only group by country, but we want also to get the top player for each of the country.  This means we want to group by country and then select the first player.  In standard SQL we do this with joining with same table, but we could also use partition by and windowing functions. For each row the window function is computed across the rows that fall into the same partition as the current row.  Window functions are permitted only in the  SELECT  list and the  ORDER BY  clause of the query They are forbidden elsewhere, such as in  GROUP BY ,  HAVING  and  WHERE  clauses. This is because they logically execute after the processing of those clauses Over, Partition By So in order to do a window we need this input: - How do we want to group the data which windows do we want to have? so  def c

Building Secure and Reliable Systems

A recent book was published this year by Google about site reliability and security engineering, I would like to provide you a brief overview of it and incorporate my own analysis and thoughts about this subject while saving you some time from reading, at least part of it. Take a few of your customers and ask them, what are the top 5 features on my product that you like.  The answer that you are likely to get is, I really like how polished the UI is, or the daily report I get by mail is just fantastic, or since I started using your product I was able to save one hour a day my productivity got up and the share /chat button on document that you added recently is doing a great job. Your customers are very unlikely to answer the question of what top 5 features of my product do you like with I really like its security or I really like that we lost no chat messages since I started using it.  No real customer will even think of it, moreover, assuming you did a very good job, they won&#