Executors utilize cache so that the tasks can run faster.
Node => Multiple Executors: Each node has multiple executors.
RDD => Big DataStructure: Its main strength is that it represents data which cannot be stored on a single machine, so its data is distributed, partitioned, split across computers.
Input => RDD: Every RDD is born out of some input like a text file, hadoop files etc.
Output => RDD: The output of functions in spark can produce an RDD. So it's like one function after another each receives an input RDD and outputs an output RDD, it's functional.
RDD[Type, Type] : RDD's are typed, they are data of a certain type.
RDD => 1,2,3: RDD's are ordered.
RDD => Zzzz: RDD's are lazily evaluated. We said functional, didn't we? so you have multiple transformations on your data and only when you hit an action you need the actual data.
RDD => Partitioned: RDD's are partitioned between servers, we said it's big data so we need to partition it.
RDD => Array(thing1, thing2, thing3) : You can think of RDD's as a bunch of things.
Guys if you have any other mess and want me to cheatsheet something for you just comment below, also I would highly appreciate any comment's about this post please feedback me!