M S Dillibabu
May 31, 2022

Thanks, I got your question As i already mentioned

In spark, the recommended each partition size should be between 100 to 200 MB. Because executor needs to store that partition data it in memory if executor not able to accommodate in memory then it leads to spill over. if we set shuffle partition as 480 then each partition size comes around 213 MB

Formula:

No of shuffle partition = total size/partition size

partition size = 102400/480 = 213 MB

Few more insights/formulas for you :

No of partitions = no of Tasks

No of cores = No of parallel execution of tasks.

So here partition is directly proportional to No of partitions.

Responses (1)