Why mapPartitionsWithIndex cause a shuffle in Spark? -
i'm new in spark. i'm checking shuffling issues in test application , don't know why in program mappartitionswithindex
method cause shuffle! can see in picture initial rdd has 2 16mb partition , shuffle write 49.8 mb. know map
or mappartition
or mappartitionswithindex
not shuffling transformation groupbykey
see cause shuffle in spark. why?
i think performing join/group operation after mappartitionswithindex , causing shuffle.
you can establish modifying code.
current code
val rdd = inputrdd1.mappartitionswithindex(....) val outrdd = rdd.join(inputrdd2)
modified code
val rdd = inputrdd1.mappartitionswithindex(....) println(rdd.count)
Comments
Post a Comment